Hi All,

This is my first post here, so please bear with me. I'm developing a project which compares many portions (say 200, of 100x100 pixels regions) of huge amounts (> 3000) of medium resolution (say 640x480) 16-bit images to portions within a single specific 16 bit reference image. I was wondering if it was possible to speed things up a little bit. The following piece of code is used to compare one portion of the image to another, and appears to be the biggest bottleneck in my code.


Code:
// globally known variables are:// referencePixels (2d array of words, contains a reference image)
// currentPixels (2d array of words, contains the current image)
// avgRef (double, mean intensity value of reference image within a specific rectangle) this value is always > 0
// rect (the specific rectangle within images are compared)
// xo, yo offset from a rectangle in the current image (GetImgDifference is usually being called around for xo and yo from -4 to +4 )


function GetImgDifference( rect : TRect; xo, yo : integer): single;
var
  x, y : integer;
  avgCur : double;
  intensityCurLW,multLW : LongWord;
  c : integer;
  dLW: LongWord;
begin
  
  // calculate the mean intensity value of a rectangle within an image
  intensityCurLW := 0;
  for x := rect.Left to rect.Right do
    for y := rect.Top to rect.Bottom do
      intensityCurLW := intensityCurLW + (currentPixels[y + yo,x + xo] shr 6); 


  // the size of the rectangle
  c := ((rect.Right - rect.Left)+1) * ((rect.bottom - rect.top)+1);
  
  // if the current intensity is larger than 0
  if (intensityCurLW > 0) then begin
    // store average intensity of current image in avgCur
    avgCur := intensityCurLW / c; 
    
    // calculate factor to correct for mean difference in current and reference image
    // let's store this in a long word (accurate enough, and fast)
    multLW := Round(16384 * avgRef / avgCur); // 2^14


    // calculate the absolute difference between the reference image and the current image
    dLW := 0;
    for x := rect.Left to rect.Right do
      for y := rect.Top to rect.Bottom do
        dLW := dLW + abs(  ((currentPixels[y + yo,x + xo]  * multLW) shr 14) - referencePixels[y,x] ) shr 6;


    // the result will be the average pixel difference
    result := dLW / c;
  end else begin
    result := PrettyLargeSingle; // return a huge single value when the current image had 0 intensity
  end;
end;
Imagine this function being called about 40 million times during runtime (4000 frames, 200 rectangles, -3 - +3 = 49 places around the rectangle). I'm afraid I can't really go back from 16-bit images back to 8-bit images, and I don't see how I can improve on this code any more. Would it be possible to speed things up a bit using for example SSE1/2 instructions, and if so, how?

If you have any questions, please do let me know!

Kind regards,

Emil