[delphi 7]code optimalisation, comparing 2d matrices of words [SSE?]

**Emil** · 27-08-2012, 10:28 PM

Thanks again for the suggestions.

... make the necessary optimizations for the technique itself so more work can be put on actual CPU with less stress on bandwidth...

I am doing this, it is actually what I have been doing over the past two years, not fulltime, but still, I do consider myself to be a bit of an expert on these lucky imaging techniques. I have been doing astrophotography since 2006, and have a fairly good understanding of how all of this works.The underlying algorithms are already optimized. There is a lot of specialised code surrounding these few lines I posted here that make sure the brute force pixel comparisons are not done unnecessarily. One way or another, at one point you have to start looking into the actual image data.

I'm not trying to say I know everything there is to know, that is one of the reasons I'm here actually, I want to learn new things, but I do know a lot more than nothing about lucking imaging and relevant image processing techniques. Just not that low-level.

I'm never actually working in RGB color space to begin with, all actual time consuming processing is done on grayscale images. Most input actually is 16 or 8 bit grayscale images. I'm familiar with some of the color spaces you mention, but they don't apply to this project.

To give you an idea about the kinds of processing speeds I get on a pretty old system (dual core, 4 gig ram) right now for this case.
- Processing a 600MB file containing 8-bit grayscale data of a white light image of the sun containing 1638 frames of 640x480. The entire FOV is full.
- Using 910 alignment areas with a size of 25x25 pixels
- Stacking only the best portions out of 200 frames

1 thread / 2 threads
9.4 / 4.9 seconds to align the data (to compensate for global image movements).
14.7 / 7.7 seconds to buffer the data to memory and calculate the quality of the sections of the frames
3.7 / 2.2 seconds to calculate a reference frame
16.8 / 8.8 seconds to align all of the 910 alignment points over the best 200 frames.
1.7 / 1.4 seconds to stack the best 200 frames for each AP given the previously retrieved alignment information.

This is what a single raw frame in that recording looks like (you can see many distortions in this image, but believe it or not, this is actually a pretty good frame. My software won't try to align every single pixel in this image, some portions are clearly not detailed enough and will simply be ignored. If you look careful, you'll see that certain parts are actually kind of sharp. But even those are still warped a little bit. )
http://www.astrokraai.nl/dump/RawFrame.jpg
and this is the resulting image:
http://www.astrokraai.nl/viewimages.php?id=201&cd=11

GPGPU would be very interesting, but at the moment is way to complex for me. It will be the future though, so I'll try to learn about it as much as I can.

LP · 28-08-2012, 01:52 AM

By the way, I've been thinking... You could also try some unorthodox approach using simple GPU tricks.

Taking this code:

for x := rect.Left to rect.Right do
for y := rect.Top to rect.Bottom do
dLW := dLW + abs( ((currentPixels[y + yo,x + xo] * multLW) shr 14) - referencePixels[y,x] ) shr 6;

Use the following approach:
1) Load image_1 and image_2 with A16B16G16R16.
2) Draw image_1 on A16B16G16R16 render target.
3) Draw image_2 on the same render target using subtract blending operation.
4) Take render target and generate full set of mipmaps up to 1x1 on GPU.
5) Smallest mipmap (1x1, one pixel) contains the resulting average difference, in 16-bit.

You could even improve resolution to 32-bit, but generating mipmaps will be more complicated. You will have to make them sequentially by drawing 50% image on render target and repeat the same process until last render target is 1x1 pixel. This is most likely how GPU does it anyway though.

Using above approach you take advantage of full GPU's parallel processing power and memory bandwidth, but without using complex GPGPU techniques.

P.S. Using shaders, calculating differences and making mipmaps can be even combined into one single step to reduce number of iterations (e.g. reduce image by 4x, calculating differences for the reduced segment).

**Emil** · 28-08-2012, 02:48 PM

Interesting. I'm all new to GPU Graphics programming, so I'll try to read some more about it.
(Any suggestions for a nice tutorial on how to implement something like this?)

Moderation Process Reminder

Thread: [delphi 7]code optimalisation, comparing 2d matrices of words [SSE?]

Thread Tools

Display

Bookmarks

Bookmarks

Posting Permissions