PDA

View Full Version : assembly help



peterbone
31-10-2004, 01:09 PM
I've never learnt assembly (I'm a self-taught programmer but that's no excuse) and I'm trying to speed up the inner loop of my gouraud shading rasterizer. The inner loop scans each line and is only 9 lines and I was wondering if one of you assembly experts could tell me the assembly equivalent. Any speed up in this inner loop will greatly increase the speed of the routine.

for x := ScanStart to ScanEnd do begin
RGBT.rgbtBlue := Round(RGB.B);
RGBT.rgbtGreen := Round(RGB.G);
RGBT.rgbtRed := Round(RGB.R);
Scan[x] := RGBT;
RGB.B := RGB.B + RGBdx.B;
RGB.G := RGB.G + RGBdx.G;
RGB.R := RGB.R + RGBdx.R;
end;
The rest is here http://www.geocities.com/peter_bone_uk/Gouraud.txt

Thanks for any help.

Peter Bone

Paulius
31-10-2004, 03:18 PM
You can see yourself what the code gets translated to: Run to cursor, view->Debug Windows->CPU. But there probably won?¢_Tt be much assembler improvements to such a loop, instead look at its slowest part ?¢_" rounding. I suggest you convert it to fixed point: Multiply floats by a power of two and round them outside the loop, then in the loop instead of rounding you?¢_Tll be able to get away with only using shifts.

peterbone
01-11-2004, 10:29 AM
Thanks, that sounds like a really clever idea. Are you sure that would give the same precision though?

peterbone
01-11-2004, 11:21 AM
I tried your idea and it worked. I replaced the inner loop with this

B := Round(RGB.B * 256);
G := Round(RGB.G * 256);
R := Round(RGB.R * 256);
Bd := Round(RGBdx.B * 256);
Gd := Round(RGBdx.G * 256);
Rd := Round(RGBdx.R * 256);
for x := ScanStart to ScanEnd do begin
RGBT.rgbtBlue := B shr 8;
RGBT.rgbtGreen := G shr 8;
RGBT.rgbtRed := R shr 8;
Scan[x] := RGBT;
Inc(B, Bd);
Inc(G, Gd);
Inc(R, Rd);
end;

R, G, B, Rd, Gd, Bd are all integers.

I tested it by drawing a large gouraud triangle 10000 times. The old routine took 17 seconds - the new routine took 14 seconds :D .

Thanks for your help

Peter

peterbone
01-11-2004, 11:38 AM
now it's

B := Round(RGB.B * 256);
G := Round(RGB.G * 256);
R := Round(RGB.R * 256);
Bd := Round(RGBdx.B * 256);
Gd := Round(RGBdx.G * 256);
Rd := Round(RGBdx.R * 256);
for x := ScanStart to ScanEnd do begin
Scan[x].rgbtBlue := B shr 8;
Scan[x].rgbtGreen := G shr 8;
Scan[x].rgbtRed := R shr 8;
Inc(B, Bd);
Inc(G, Gd);
Inc(R, Rd);
end;

and it's down to 9 seconds!

Useless Hacker
01-11-2004, 07:59 PM
You might increase the efficiency further by putting the Scan[x] part in a 'with' block, so that the array access only happens once... although the compiler might optimise that anyway.
B := Round(RGB.B * 256);
G := Round(RGB.G * 256);
R := Round(RGB.R * 256);
Bd := Round(RGBdx.B * 256);
Gd := Round(RGBdx.G * 256);
Rd := Round(RGBdx.R * 256);
for x := ScanStart to ScanEnd do begin
with Scan[x] do begin
rgbtBlue := B shr 8;
rgbtGreen := G shr 8;
rgbtRed := R shr 8;
end;
Inc(B, Bd);
Inc(G, Gd);
Inc(R, Rd);
end;

WILL
02-11-2004, 02:05 AM
If RGB.B, RGB.G, RGB.R, RGBdx.B, RGBdx.G and RGBdx.R are all integers then would you not be able to speed it up even further with the following instead of using a '* 256'?

B := Round(RGB.B) shl 8{* 256};
G := Round(RGB.G) shl 8{* 256};
R := Round(RGB.R) shl 8{* 256};
Bd := Round(RGBdx.B) shl 8{* 256};
Gd := Round(RGBdx.G) shl 8{* 256};
Rd := Round(RGBdx.R) shl 8{* 256};

Unless these are reals it should speed up each itteration of the call slightly.

peterbone
02-11-2004, 11:33 AM
The 'with' did speed it up. Now down to 8 seconds (over twice as fast as it was originally)! I thought 'with' was just to make the code tidier - I didn't realize it made a difference to the compiled code.

RGB and RGBdx are floating point WILL.

Thanks for everyones help. I'll probably make the iteration through the lines fixed point using the same shifting technique now.

Peter

peterbone
02-11-2004, 12:05 PM
I've got a really bafaling problem now. The rasterizer breaks the triangle into 2 sections - above and below the middle vertex. So the line rasterizer code that I've been optimizing is in 2 places. If I use the 'with' on just one of them (either one) then it's 1 second faster. If I use the 'with' on both of them then it's 3 seconds slower. How can that be?

If any of you want to play around with my speed test program then I've put it here
http://atlas.walagata.com/w/peterbone/GouraudSpeedTest.zip

WILL
02-11-2004, 01:29 PM
RGB and RGBdx are floating point WILL.
Sorry, I was posting late last night while being a bit tired. I meant that as long as it's not a real from 0 to 1. If it's a a real number it *will* be a tad faster simply because you're not mixing types and the shift of course.