PDA

View Full Version : Benchmarking your functions in a multi-threaded fashion.



Robert Kosek
29-05-2007, 04:23 PM
Cross posted from my website... (http://forums.thewickedflea.com/index.php?topic=107.new#new)

The final result:
http://upload8.postimage.org/570965/screenshot.jpg (http://upload8.postimage.org/570965/photo_hosting.html)

First, we need to start with a default VCL application in Delphi. Because I do not know the ways that Lazarus handles threads I make no promises that it carries over. So many people are concerned about threads, their dangers, and potential problems with exceptions that I figured I would show you how to benchmark faster on multicore computers. Safely so.

First, get a decent UI consisting of a pair of labels, one indicating which function is being tested and the other for the average execution time, a default progress bar, and two buttons for testing 10,000 and 100,000 times. Why just 10,000 and 100,000? Well, doing 1,000,000 can take a while to accomplish. Also, make sure your progressbar has "step" set to 1, not 10;

Now, above your form object insert this blank template code:
TBenchmarkThread = class(TThread)
private
constructor Create(CreateSuspended: boolean = false);
protected
fTimes: Cardinal; // The number of times we execute the same function.
fProg: TProgressBar; // Our Progressbar (multithreaded progressbar calls!).
fS: String;
procedure Execute; override;
end;


First, why these pieces? Well, the function I will be testing is StrToInt and I'll be testing 10000 and -10000 conversions simultaneously; that means half the tests are negative, half are positive. The internal progressbar reference lets us easily indicate our progress, and carry it between applications.

Now, doubleclick the first benchmark button, the one for the 10000, and enter this code into the function that was created:
Benchmark(TButton(Sender).Tag);
This lets us assign this to the other button too, which you should do, and set the tags of each button to the appropriate number. Once this is done define a private benchmark function in your form. Some of these functions and properties aren't around yet, but don't panic. Mine is:procedure Benchmark(Times: Cardinal);

procedure TForm1.Benchmark(Times: Cardinal);
var t1,t2: TBenchmarkThread;
begin
progress.Max := Times;
progress.Position := 0;
resultLbl.Caption := '...';

t1 := TBenchMarkThread.Create( '10000', Times div 2, progress);
t2 := TBenchMarkThread.Create('-10000', Times div 2, progress);

while not t1.Terminated and not t2.Terminated do
begin
Application.ProcessMessages;
sleep(20);
end;

// Calculate the average execution time:
resultLbl.Caption := floattostr((t1.ExecutionTime + t2.ExecutionTime) / Times)+' msec';

t1.Free;
t2.Free;
end;

Notice the section where the application processes its messages and then sleeps for a little? Well, this reduces the load of your main thread monitoring the sub-threads and still keep your application updating ... IE, you avoid that pesky "not responding" label in the task manager. And you can drag the window around, etc. Notice that the threads terminate themselves when the results are complete, and that we free the threads later. Now, there isn't a progress label in this yet, but I'll show you that in a minute. ;)

Okay, revise the constructor to match these parameters:constructor Create(Str: String; Count: Cardinal; ProgressBar: TProgressBar; CreateSuspended: boolean = false);And, add this property to your thread's private area:ExecutionTime: Cardinal;

And now for the nitty gritty, the thread code:
constructor TBenchmarkThread.Create(Str: String; Count: Cardinal; ProgressBar: TProgressBar; CreateSuspended: boolean = false);
begin
fS := Str;
fTimes := Count;
fProg := ProgressBar;

inherited Create(CreateSuspended);
end;

procedure TBenchmarkThread.Execute;
var i: cardinal;
begin
ExecutionTime := GetTickCount;
for i := 1 to fTimes do begin
StrToInt(fS);
Synchronize(fProg.StepIt); // This prevents simultaneous function calls. ESSENTIAL!
end;
ExecutionTime := GetTickCount-ExecutionTime;

Terminate; // Done, and prove it.
end;


You are done. Wow, easy huh? :D Here is our final code:unit Unit1;

interface

uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls, ComCtrls;

type
TBenchmarkThread = class(TThread)
private
ExecutionTime: Cardinal;
constructor Create(Str: String; Count: Cardinal; ProgressBar: TProgressBar; CreateSuspended: boolean = false);
protected
fTimes: Cardinal; // The number of times we execute the same function.
fProg: TProgressBar; // Our Progressbar (multithreaded progressbar calls!).
fS: String;
procedure Execute; override;
end;
TForm1 = class(TForm)
Label1: TLabel;
resultLbl: TLabel;
progress: TProgressBar;
Button1: TButton;
Button2: TButton;
Button3: TButton;
QuitBtn: TButton;
procedure QuitBtnClick(Sender: TObject);
procedure Button1Click(Sender: TObject);
private
{ Private declarations }
procedure Benchmark(Times: Cardinal);
public
{ Public declarations }
end;

var
Form1: TForm1;

implementation

{$R *.dfm}

procedure TForm1.Benchmark(Times: Cardinal);
var t1,t2: TBenchmarkThread;
begin
progress.Max := Times;
progress.Position := 0;
resultLbl.Caption := '...';

t1 := TBenchMarkThread.Create( '10000', Times div 2, progress);
t2 := TBenchMarkThread.Create('-10000', Times div 2, progress);

while not t1.Terminated and not t2.Terminated do
begin
Application.ProcessMessages;
sleep(20);
end;

// Calculate the average execution time:
resultLbl.Caption := floattostr((t1.ExecutionTime + t2.ExecutionTime) / Times)+' msec';

t1.Free;
t2.Free;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
Benchmark(TButton(Sender).Tag);
end;

procedure TForm1.QuitBtnClick(Sender: TObject);
begin
Close;
end;

{ TBenchmarkThread }

constructor TBenchmarkThread.Create(Str: String; Count: Cardinal; ProgressBar: TProgressBar; CreateSuspended: boolean = false);
begin
fS := Str;
fTimes := Count;
fProg := ProgressBar;

inherited Create(CreateSuspended);
end;

procedure TBenchmarkThread.Execute;
var i: cardinal;
begin
ExecutionTime := GetTickCount;
for i := 1 to fTimes do begin
StrToInt(fS);
Synchronize(fProg.StepIt); // This prevents simultaneous function calls. ESSENTIAL!
end;
ExecutionTime := GetTickCount-ExecutionTime;

Terminate; // Done, and prove it.
end;

end.

I bet you might just be surprised at how quick it is too! Here's the example application (http://www.thewickedflea.com/download.php?f=StrToInt_Benchmarker.zip). Maybe this will encourage more people to venture into multithreaded games and applications for games; it isn't as hard as some people make it out to be. I wrote one just thirty minutes ago to compare StrToInt to my own internal one, for scripting stuff, to see just how efficient they were ... in a faster benchmarking method. You can even test different functions simultaneously in real time, which is just a whole lot faster on a multi cored computer.

AthenaOfDelphi
29-05-2007, 08:33 PM
Nice example Robert.

But I believe there is a minor oversight....

You are using Synchronize... this will of course pass control back to the main VCL thread by way of some funky message handler... it also means that you are including the time this takes to execute in your benchmark results.

You need to take a snapshot of your timer before the synchronize and after, in order to establish how long that call takes, so you can take that into account when working out the final timing data. In the grand scheme of things, it may be not worth worrying about, but if (as in your example) you are benchmarking something which doesn't take too long itself, the overhead of the synchronize could blow your results away.

If I recall correctly, you may also find problems when running this code on multicore/cpu hardware as gettickcount can return bogus values in certain circumstances. This is why queryPerformanceCounter is preferable, but even that isn't guaranteed from what I've read of the API documentation. To make it super robust as far as timing is concerned, you could provide all three methods (datetime, gettickcount and queryPerformanceCounter) and let the user decide if any of them are bogus.

Overall though, nice example for others to follow. I did something similar to this when I wanted to test one of the back office components of our browser game, except I had 8 threads pounding another machine for load testing.

I've just run your test app, and I spotted a problem... you may want to lose the synchronize altogether... or at least do some form of if (count mod 100=0) then doSync; or something like that.... the reason I say this, is that by having both threads synchronize in this way, you are effectively making your program single threaded because they are bottlenecked into the main VCL thread. I watched the processor usage when the app was running... I got 50% with only one core being utilised... this is down to synchronizing.

So, an improvement on the execute method would be something like this:-


var
startTime : cardinal;
syncStartTime : cardinal;
syncTime : cardinal;
i : cardinal;
begin
syncTime:=0;
i:=1;
startTime:=getTickCount;
repeat
strToInt(fS);
syncStartTime:=getTickCount;
if (i mod 100=0) then
begin
synchronize(fProg.stepIt);
end;
inc(i);
syncTime:=syncTime+(getTickCount-syncStartTime);
until (i>fTimes);
executionTime:=getTickCount-startTime-syncTime;
end;


I think (I've not tried it) that this will work a lot better, although you will obviously have to modify your progress bar values to take into account the modulo 100 calling pattern for the 'stepit' routine.

But... this could definitely raise the issue about getTickCount returning bogus values on multicore/cpu systems because now the threads really will be running simultaneously.

Robert Kosek
29-05-2007, 08:45 PM
Considering there is no other way to synchronize threads, and if you call a procedure to something with both threads you get even more funky issues, we're kinda stuck with that. GetTickCount returns the number of ticks since the last boot, IIRC, and the queryPerformanceCounter only returns the number of processor cycles ... which clears once a second. That makes it harder to handle. Though, potentially, if you wanted to get into the nitty gritty just use the performance counter to increment the cardinal before the synchronization call; at the end you would still have to divide by both the cycles a second and the number of times you've done your calculations.

But hey, for 30 minutes of experimentation this morning I got threaded stuff done without a hitch. That isn't nearly so intimidating as many folks make it out to be (which is half my hidden point in there).

I have a dual core system, and unless you know how (which I don't) to set a thread's affinity to a specific core it will show 50% usage. Except I look at the graphs and that isn't quite true. It seems to be closer to 66% percent usage than 50%.

Robert Kosek
29-05-2007, 09:36 PM
I wondered how much a performance difference you could get with the methods, so I dug about and checked. You are correct, the difference is huge. 1.17MS down to 0.0017MS per call. We really need some better thread synchronization techniques in Delphi!

I changed the timer style for greater accuracy as well, though it simply gave additional digits and no real great change except more significant digits. And processor usage with range from an average of 60% all the way up to 88% now for me. Also, the sleep command slows the processing by just a little.

That means that the 1,000,000 button only takes about 5 seconds. I'm uploading the final version now to replace the sample, and here is the final code for perusal. (So folks know my methods w/o downloading.)

unit uMain;

interface

uses
Windows, Messages, SysUtils, Variants, Classes, Graphics, Controls, Forms,
Dialogs, StdCtrls, ComCtrls;

type
TBenchmarkThread = class(TThread)
private
ExecutionTime: Cardinal;
constructor Create(Str: String; Count: Cardinal; ProgressBar: TProgressBar; CreateSuspended: boolean = false);
protected
fTimes: Cardinal; // The number of times we execute the same function.
fProg: TProgressBar; // Our Progressbar (multithreaded progressbar calls!).
fS: String;
procedure Execute; override;
end;
TForm1 = class(TForm)
Label1: TLabel;
resultLbl: TLabel;
progress: TProgressBar;
Button1: TButton;
Button2: TButton;
Button3: TButton;
QuitBtn: TButton;
procedure QuitBtnClick(Sender: TObject);
procedure Button1Click(Sender: TObject);
procedure FormCreate(Sender: TObject);
procedure FormCloseQuery(Sender: TObject; var CanClose: Boolean);
private
working: boolean;
{ Private declarations }
procedure Benchmark(Times: Cardinal);
public
{ Public declarations }
end;

var
Form1: TForm1;
PerfFreq: Int64;

implementation

{$R *.dfm}

procedure TForm1.Benchmark(Times: Cardinal);
var t1,t2: TBenchmarkThread;
begin
working := true;
progress.Max := Times div 100;
progress.Position := 0;
resultLbl.Caption := '...';

t1 := TBenchMarkThread.Create( '10000', Times div 2, progress);
t2 := TBenchMarkThread.Create('-10000', Times div 2, progress);

while not t1.Terminated and not t2.Terminated do
Application.ProcessMessages;

// Calculate the average execution time:
resultLbl.Caption := floattostr((((t1.ExecutionTime + t2.ExecutionTime) / PerfFreq) / Times) * 1000)+' msec';

t1.Free;
t2.Free;
working := false;
end;

procedure TForm1.Button1Click(Sender: TObject);
begin
if not working then
Benchmark(TButton(Sender).Tag);
end;

procedure TForm1.FormCloseQuery(Sender: TObject; var CanClose: Boolean);
begin
CanClose := not working;
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
QueryPerformanceFrequency(PerfFreq);
end;

procedure TForm1.QuitBtnClick(Sender: TObject);
begin
Close;
end;

{ TBenchmarkThread }

constructor TBenchmarkThread.Create(Str: String; Count: Cardinal; ProgressBar: TProgressBar; CreateSuspended: boolean = false);
begin
fS := Str;
fTimes := Count;
fProg := ProgressBar;

inherited Create(CreateSuspended);
end;

procedure TBenchmarkThread.Execute;
var i: cardinal;
q1,q2: Int64;
begin
for i := 1 to fTimes do begin
QueryPerformanceCounter(q1);
StrToInt(fS);
QueryPerformanceCounter(q2);
Inc(ExecutionTime,Abs(q2-q1));
if i mod 100 = 0 then Synchronize(fProg.StepIt); // This prevents simultaneous function calls. ESSENTIAL!
end;

Terminate; // Done, and prove it.
end;

end.

Setharian
30-05-2007, 02:55 PM
I wondered how much a performance difference you could get with the methods, so I dug about and checked. You are correct, the difference is huge. 1.17MS down to 0.0017MS per call. We really need some better thread synchronization techniques in Delphi!
that's normal, .Synchronize method puts the method into a list of the main VCL thread and it gets executed once that thread gets its time slice from the OS...that's the reason of the delay...Windows supports so called APCs (asynchronous procedure calls) which basicly synchronize a method from one thread with the another and does the same thing as Delphi's .Synchronize but on the OS level....drawback of that is it is OS-specific and the thread must be in an alertable wait to process the APC queue which it normally isn't....I don't think you can come up with a better idea of how to execute a method in the context of the target thread....you can force-process the APC queue using undocumented function exported from ntdll.dll - NtAlertThread but I'm not sure if that is acceptable....