Single cycle optimizations is a thing of the past tbh,
Let's pass the vectors by reference and be done with it.
[pascal]
function Cross(const A, B: TVector4f): TVector4f;
begin
cross.x := A.y * B.z - B.y * A.z;
cross.y := A.z * B.x - B.z * A.x;
cross.z := A.x * B.y - B.x * A.y;
cross.w := 0;
end;[/pascal]
I did some profiling on the above, using the original function, the const version and inlined version
[pascal] for i:=0 to NUM_TESTS do begin
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
C:=Cross(A,B);
end;
[/pascal]
Code:
Testing 1000000 iterations, 10 loop unroll
First test, standard: 4,59062534 s
Second test, const parameter: 4,47177979 s
Third test, inlined: 4,53005200 s
Testing 1000000 iterations, 10 loop unroll
First test, standard: 4,58233044 s
Second test, const parameter: 4,46661824 s
Third test, inlined: 4,53930653 s
Testing 1000000 iterations, 10 loop unroll
First test, standard: 4,58337498 s
Second test, const parameter: 4,45365933 s
Third test, inlined: 4,53174216 s
And this is using my somewhat bugged XP3200+.
Interesting to see here is that the inline version actually is slower (inline as in copy + paste) then the other ones.
As you see in this test you can punch up quite a few cross products per frame without affecting the framerate, are you shure that the cross product is you'r bottleneck ?
Bookmarks