PDA

View Full Version : Benchmarks! FPC vs Delphi vs C++



Chebmaster
26-03-2008, 11:30 PM
Use
http://babelfish.altavista.com/babelfish/tr
to translate this forum page
http://www.gamedev.ru/flame/forum/?id=78283&page=22
from Russian - and you'll find some interesting stuff (you can also download the sources and binaries there).

There's an ongoing benchmark with building a mandelbrot fractal image in 3000x4000 pixels.

On my machine the results are (miliseconds):


Intel Dual Core -- AMD Sempron 2400, 1.6 GHz both.

The test time in miliseconds.

MSVC8 (single, sse) - 3100 -- crashed
turbo delphi-double 7150 -- 8280
turbo delphi-single - 5400 -- 5157
fpc-double 12050 -- 8734
fpc-double-sse2 - - 4800 -- crashed
fpc-single 4970 -- 4625
fpc-single-sse2 - - 4460 -- 4875

All versions are required to save their fractal as a bmp, to avoid mistakes.

At least shows that
1) it's unwise to use the Double type in the game development.
2) Free Pascal 2.2.0 has the best code optimizer among all things Pascal

farcodev_
27-03-2008, 01:59 AM
interesting, ill use single only in my project now :shock:

arthurprs
27-03-2008, 02:35 AM
uhm, double type float point operations are not very well optimized :?

Brainer
27-03-2008, 04:52 AM
To be honest, I was using Single type from the very beginning. :)

Mirage
27-03-2008, 06:01 AM
This topic is still alive.:)

I've downloaded some early tests (100x100, double). Results (on P3 celeron):
Cpp - ~1100ms
D7 - ~1700ms
FPC 2.2.0 - same as D7.
On C2 Duo the results are simply faster for all the compilers.

Optimization flag in options doesn't affect performance directly. How to declare Deep and Scale variables - as constants or as vars doesn't affect as well. ;) But the order of declaration is important (aligment?).

FPC's performance is same as Delphi's one with Double type and slightly higher with Single type. This is a good news because earlier versions of FPC were slower than Delphi.

What I think about it:
This test consists of FPU computations only. No random memory access (you can comment out writing to the array with almost no boost, all is in cache), no API calls. Pure computations. Not a real case (unless you are writing a physics engine).
Delphi does not optimize FPU code at all (FWAITs, carring about FPU exceptions, use of instructions like SAHF (slow), etc.).
On the other hand this case is most easy for a code optimizer (where it present).
Nevertheless slowdown is about 50-60% which is not critical for such an ideal case.

I see later benchmarks have some result testing means.:)
I used more accurate test:

Digest := 0;
for dy := 0 to height -1 do for dx := 0 to width - 1 do Digest := Digest + pix[dy, dx];
Digest should be the same for all versions. I'll not be suprised if some very optimized versions will give wrong results or loose performance because computation results are used.;)
UPD: Digest is wrong for FPC with "OG2" or "OG3" options. :(
But it's good also because this test can be a bug report. :)

Chebmaster
27-03-2008, 09:40 AM
I've downloaded some early tests (100x100, double).
These were incorrect, use the new ones below (save their fractal as a bmp to the drive c:\ root)

And don't forget: the {$fputype sse2} directive can do wonders to your code speed!

The full battery of benchmarks, including sources, here (500K):
http://217.70.20.10/_share/_004/fpc_fractal_benchmarks.zip

The benchmark source (fpc):

program fpctest1;

{$apptype console}
{$mode objfpc}
{$asmmode intel}
{$fputype sse2}

uses
SysUtils, Windows;

const height = 4000;
width = 3000;
scale = 0.0008;
deep = 100;

type
float = double;
//float = single;

var pix: array [0..height-1,0..width-1] of longint;
time: longint;
f : file;
fh : TBitmapFileHeader;
bh : TBitmapInfoHeader;


procedure build_fractal(scale: float; deep: longint);
var
color, dx, dy : Integer;
cx, cy, zx, zy, zxt : float;
begin
cy := (height div 2) * scale;
for dy := height -1 downto 0 do
begin
cy := cy - scale;
cx := (width div 2) * scale;
for dx := width - 1 downto 0 do
begin
color := 0;
// Calculate color
cx := cx - scale;
zx := cx;
zy := cy;
while zx * zx + zy * zy < 4 do
begin
zxt := zx * zx - zy * zy + cx;
zy := 2 * zx * zy + cy;
zx := zxt;
inc(color);
if color > deep then break;
end;
pix[dy, dx] := 4 * color;
end;
end;
end;

var
cw: word;

begin
cw:= $033F; //?ê¬??ê¬? ?ê¬??ê¬? ?ë‚Ä°?ë‚Äö?ê¬æ ?ê¬??ꬵ ?ê¬??ꬪ?ê¬??ë¬è?ꬵ?ë‚Äö ?ë‚Äö?ê¬??ê¬??ê¬? :(
asm
fldcw [cw]
end;
FillChar(pix, sizeof(pix), 0);
time := GetTickCount();

build_fractal( scale, deep);

time:= GetTickCount() - time;
WriteLn( time );

fh.bfType := WORD('B') + WORD('M') shl 8;
fh.bfSize := SizeOf(TBitmapFileHeader);
fh.bfReserved1 := 0;
fh.bfReserved2 := 0;
fh.bfOffBits := fh.bfSize + SizeOf(TBitmapInfoHeader);

FillChar(bh, SizeOf(TBitmapInfoHeader), 0);
bh.biSize := SizeOf(TBitmapInfoHeader);
bh.biWidth := width;
bh.biHeight := height;
bh.biPlanes := 1;
bh.biBitCount := 32;

Assign(f, 'c:\' + ChangeFileExt(ExtractFileName(ParamStr(0)), '.bmp'));
Rewrite(f, 1);
BlockWrite(f, fh, SizeOf(TBitmapFileHeader));
BlockWrite(f, bh, SizeOf(TBitmapInfoHeader));
BlockWrite(f, pix, width * height * 4);
Close(f); // *
ReadLn;
end.

Robert Kosek
27-03-2008, 01:09 PM
Speeds with an AMD Athalon 4800 X2 (From your download, Chebs):
Delphi - 3782
Turbo Delphi, Single - 3609
Turbo Delphi, Double - 5422
MSVC Single SSE - 2156
FPC Double - 6047
FPC Double SSE2 - 2922 (!!!)
FPC Single - 3234
FPC Single SSE - 3515 (??)

It looks to me that the overall best speeds go to the SSE2 optimized code, but especially the Double code for SSE2. Kind of surprising, the boost for doubles, but pleasing. Only 750ms behind C++. :D

My only question is that if you want to enable single/double optimization by SSE2, how to you guarantee that the program will still run on a system without SSE2? I'm thinking that you'd need a whole new executable for that compiled without SSE2 optimizations.

JernejL
27-03-2008, 08:07 PM
To be honest, I was using Single type from the very beginning. :)

Not when you call functions that pass float parameters as "extended"...

Chebmaster
27-03-2008, 08:39 PM
FPC Single - 3234
FPC Single SSE - 3515 (??)
It seems that AMD has better general FPU than SSE1/2. It was theoretized that Athlon's FPU is a part of their own 3dNow! technology while SSE is a child of Intel, and thus Athlon may do SSE-optimized code slower than the one that uses FPU only.

The same sityation was for my Sempron 1.6GHz (in fact just an older Athlon XP)

Robert Kosek
27-03-2008, 08:50 PM
Ah! Good to know, thanks Chebmaster. :)

arthurprs
28-03-2008, 02:08 AM
someone have the sources of the C++ code?

Mirage
28-03-2008, 05:49 AM
These tests are good because they helped me to localize and submit a bug of FPC's optimizer.:)
If the bug will be fixed it makes sense to compile releases of my demos with FPC instead of Delphi.

Chebmaster
28-03-2008, 07:39 AM
someone have the sources of the C++ code?
Somewhere around the Russian forum I posted a link to.
Something like this, if I'm not mistaken:


void build_fractal&#40; int deep, float scale &#41;
&#123;
int dx, dy;
long color;
float cx, cy, zx, zxt, zy;

cy = &#40;height / 2&#41; * scale;
for&#40; dy = height - 1; dy >= 0; dy-- &#41; &#123;
cy = cy - scale;
cx = &#40;width / 2&#41; * scale;
for&#40; dx = width - 1; dx >= 0; dx-- &#41; &#123;
color = 0;
// Calculate color
cx = cx - scale;
zx = cx;
zy = cy;
while&#40; zx * zx + zy * zy < 4 &#41; &#123;
zxt = zx * zx - zy * zy + cx;
zy = 2 * zx * zy + cy;
zx = zxt;
color++;
if&#40; color > deep &#41; break;
&#125;
pix&#91; dy &#93;&#91; dx &#93; = 4 * color;
&#125;
&#125;
&#125;

wodzu
28-03-2008, 01:36 PM
Hi folks, my results(Intel Dual CPU 2180 2 GHz):

FPC double: 7844
FPC double SSE2: 4266
FPC single: 3969
FPC single SSE: 3703
CG RAD Studio 2007 single: 4156
CG RAD Studio 2007 double:5500

new Delphi is not so bad ;-)

If you want to try CGRS2007 EXEcs: http://www.speedyshare.com/969665624.html