View Full Version : assembly help
peterbone
31-10-2004, 01:09 PM
I've never learnt assembly (I'm a self-taught programmer but that's no excuse) and I'm trying to speed up the inner loop of my gouraud shading rasterizer. The inner loop scans each line and is only 9 lines and I was wondering if one of you assembly experts could tell me the assembly equivalent. Any speed up in this inner loop will greatly increase the speed of the routine.
for x := ScanStart to ScanEnd do begin
RGBT.rgbtBlue := Round(RGB.B);
RGBT.rgbtGreen := Round(RGB.G);
RGBT.rgbtRed := Round(RGB.R);
Scan[x] := RGBT;
RGB.B := RGB.B + RGBdx.B;
RGB.G := RGB.G + RGBdx.G;
RGB.R := RGB.R + RGBdx.R;
end;
The rest is here http://www.geocities.com/peter_bone_uk/Gouraud.txt
Thanks for any help.
Peter Bone
Paulius
31-10-2004, 03:18 PM
You can see yourself what the code gets translated to: Run to cursor, view->Debug Windows->CPU. But there probably won?¢_Tt be much assembler improvements to such a loop, instead look at its slowest part ?¢_" rounding. I suggest you convert it to fixed point: Multiply floats by a power of two and round them outside the loop, then in the loop instead of rounding you?¢_Tll be able to get away with only using shifts.
peterbone
01-11-2004, 10:29 AM
Thanks, that sounds like a really clever idea. Are you sure that would give the same precision though?
peterbone
01-11-2004, 11:21 AM
I tried your idea and it worked. I replaced the inner loop with this
B := Round(RGB.B * 256);
G := Round(RGB.G * 256);
R := Round(RGB.R * 256);
Bd := Round(RGBdx.B * 256);
Gd := Round(RGBdx.G * 256);
Rd := Round(RGBdx.R * 256);
for x := ScanStart to ScanEnd do begin
RGBT.rgbtBlue := B shr 8;
RGBT.rgbtGreen := G shr 8;
RGBT.rgbtRed := R shr 8;
Scan[x] := RGBT;
Inc(B, Bd);
Inc(G, Gd);
Inc(R, Rd);
end;
R, G, B, Rd, Gd, Bd are all integers.
I tested it by drawing a large gouraud triangle 10000 times. The old routine took 17 seconds - the new routine took 14 seconds :D .
Thanks for your help
Peter
peterbone
01-11-2004, 11:38 AM
now it's
B := Round(RGB.B * 256);
G := Round(RGB.G * 256);
R := Round(RGB.R * 256);
Bd := Round(RGBdx.B * 256);
Gd := Round(RGBdx.G * 256);
Rd := Round(RGBdx.R * 256);
for x := ScanStart to ScanEnd do begin
Scan[x].rgbtBlue := B shr 8;
Scan[x].rgbtGreen := G shr 8;
Scan[x].rgbtRed := R shr 8;
Inc(B, Bd);
Inc(G, Gd);
Inc(R, Rd);
end;
and it's down to 9 seconds!
Useless Hacker
01-11-2004, 07:59 PM
You might increase the efficiency further by putting the Scan[x] part in a 'with' block, so that the array access only happens once... although the compiler might optimise that anyway.
B := Round(RGB.B * 256);
G := Round(RGB.G * 256);
R := Round(RGB.R * 256);
Bd := Round(RGBdx.B * 256);
Gd := Round(RGBdx.G * 256);
Rd := Round(RGBdx.R * 256);
for x := ScanStart to ScanEnd do begin
with Scan[x] do begin
rgbtBlue := B shr 8;
rgbtGreen := G shr 8;
rgbtRed := R shr 8;
end;
Inc(B, Bd);
Inc(G, Gd);
Inc(R, Rd);
end;
If RGB.B, RGB.G, RGB.R, RGBdx.B, RGBdx.G and RGBdx.R are all integers then would you not be able to speed it up even further with the following instead of using a '* 256'?
B := Round(RGB.B) shl 8{* 256};
G := Round(RGB.G) shl 8{* 256};
R := Round(RGB.R) shl 8{* 256};
Bd := Round(RGBdx.B) shl 8{* 256};
Gd := Round(RGBdx.G) shl 8{* 256};
Rd := Round(RGBdx.R) shl 8{* 256};
Unless these are reals it should speed up each itteration of the call slightly.
peterbone
02-11-2004, 11:33 AM
The 'with' did speed it up. Now down to 8 seconds (over twice as fast as it was originally)! I thought 'with' was just to make the code tidier - I didn't realize it made a difference to the compiled code.
RGB and RGBdx are floating point WILL.
Thanks for everyones help. I'll probably make the iteration through the lines fixed point using the same shifting technique now.
Peter
peterbone
02-11-2004, 12:05 PM
I've got a really bafaling problem now. The rasterizer breaks the triangle into 2 sections - above and below the middle vertex. So the line rasterizer code that I've been optimizing is in 2 places. If I use the 'with' on just one of them (either one) then it's 1 second faster. If I use the 'with' on both of them then it's 3 seconds slower. How can that be?
If any of you want to play around with my speed test program then I've put it here
http://atlas.walagata.com/w/peterbone/GouraudSpeedTest.zip
RGB and RGBdx are floating point WILL.
Sorry, I was posting late last night while being a bit tired. I meant that as long as it's not a real from 0 to 1. If it's a a real number it *will* be a tad faster simply because you're not mixing types and the shift of course.
Powered by vBulletin® Version 4.2.5 Copyright © 2024 vBulletin Solutions Inc. All rights reserved.