Pascal Game Development
 

Pascal Gamedevelopment Library : TTCF2

PgdLib :: Categories :: PageIndex :: RecentChanges :: Login/Register

Tripping The Class Fantastic - Basic Optimisation


Its been a rough day at work and out of the blue, I decide I'm going to write an implementation of Conway's Game of Life. I've written one with practically every other Pascal compiler I've used, but never with Delphi.

For those of you that have never encountered the Game of Life, theres a good article on Wonders Of Math.

Why is any of this relevant? Because over the course of the next couple of hours, I put together what is essentially a very untidy little program, with no glitz (the gui is basic) and no glam (it uses TPaintBox as its output), but, that does provide some useful points about the basics of optimisation.

So what is the Game of Life???

Just in case you don't know what the Game of Life is and haven't read the excellent article at Wonders of Math, it goes something like this... the Game of Life is a simulation of a simple cell based system. It is supposed to use an infinitely sized grid. Cells within the system follow these basic rules:-


So, you activate some cells and then apply these rules to every cell in the grid. Repeat, ad infinitum. This sounds like an easy little program, and it is, a really easy little program, but one that does have some good scope for introducing some basic optimisation.

There is a ZIP file containing the source code and various EXE's that illustate the points discussed in the article. It is availabled for download from here.

The Original Program

From the rules, you can see there are certain values that are important. When I originally started writing it, I couldn't remember these values and I didn't want to google them... I wanted to code, so I added some edit boxes to allow me to tweak them.

I decided I would do a small grid (80 x 80, although this quickly became 282 x 282 - the size used for all the timings) and that I would render this on a 705 x 705 paint box control with a cell size of 8 pixels. First thing to do is write a renderer and hook up events to activate/deactivate cells when the user clicks on the grid.

The renderer is incredibly basic. Using only TCanvas's fillRect method it fills the paint box clipping area with white and then proceeds to render each cell by filling it with the appropriate colour (governed by how long the cell has been alive). Moving onto the editor, the code works out where the mouse was on the click and then calculates a grid reference by div'ing them by the cell size. It then toggles the cell and redraws the display.

Once you have placed your cells, you click 'Start' and off it goes. The cells spring into life and begin their journey. When this happens, all hell break lose. The system uses TTimer to call the code that generates the next generation of cells from the existing grid. It does this repeatedly as long as the system is set to run.

If you look at the source code, you will notice a whole bunch of conditional defines at the top. What I've actually done, is deoptimise the source to take it back to the original version. This can be tried out by running the Life_Original.exe or compiling the code with none of the conditional defines set (stuff a _ in front of them), except 'optimisedvariablerules'.

First Impressions

Editing the grid is a nightmare.. lots of screen flicker and actually running the simulation... painfully slow. So lets tackle the least important of these first... editing the grid.

The Editor

You will have undoubtedly noticed the horrendous screen flicker when you edit the grid. This is caused by the full screen refresh that occurs whenever you toggle a cell. The original mouseUp event handler that forms the editor can be seen below.

procedure TfrmMain.pbMouseUp(Sender: TObject; Button: TMouseButton;
  Shift: TShiftState; X, Y: Integer);
var
   mapX : integer;
   mapY : integer;
begin
     if (fStopped) then
        begin
             mapX:=(x div _cellSize)+1;
             mapY:=(y div _cellsize)+1;

             if (fCurrentMap[mapX,mapY]>0) then
                fCurrentMap[mapX,mapY]:=0
             else
                 fCurrentMap[mapX,mapY]:=1;

             paintMap;

             if (frmOverview.visible) then
                frmOverview.pbo.invalidate;
        end;
end;


We have a couple of options open to us at this point. The first is to batch our updates, so that not every click of the mouse forces a repaint. To enable this optimisation remove the _ from the conditional define editoroptimisation1 or run the exe Life_EditOpt1.exe

This is relatively easy to achieve with the simple addition of a timer (tmrRefresh) to the form. This timer is disabled when the application starts and is only enabled by the mouseUp event handler of the TPaintBox (pb), after it has finished toggling the cell. This event handler also disables the timer when it starts... this means that so long as we click again before the interval of the timer has expired, a repaint won't occur, this means we can click click click click then pause and see our handywork when the timer fires.

tmrRefresh Event Handler


procedure TfrmMain.tmrRefreshTimer(Sender: TObject);
begin
     tmrRefresh.enabled:=false;
     paintMap;
     if frmOverview.visible then
        frmOverview.pbo.invalidate;
end;


mouseUp Event Handler of TPaintBox (pb)


procedure TfrmMain.pbMouseUp(Sender: TObject; Button: TMouseButton; Shift: TShiftState; X, Y: Integer);
var
   mapX : integer;
   mapY : integer;
begin
     if (fStopped) then
        begin
             tmrRefresh.enabled:=false;

             if (fCurrentMap[mapX,mapY]>0) then
                fCurrentMap[mapX,mapY]:=0
             else
                 fCurrentMap[mapX,mapY]:=1;

             tmrRefresh.enabled:=true;
        end;
end;


This does remove some of the screen flicker, but it introduces a new problem. Its no longer easy to see the changes your making, without waiting for an update. You could reduce the delay, but this will only increase screen flicker, clearly, this isn't a very good method for optimising the editor.

The second and most obvious method is to simply update the cell thats changed. You can see the difference by enabling editoroptimisation2 or running Life_EditOpt2.exe.

The revised mouseUp handler looks like this...

procedure TfrmMain.pbMouseUp(Sender: TObject; Button: TMouseButton;
  Shift: TShiftState; X, Y: Integer);
var
   mapX : integer;
   mapY : integer;
begin
     if (fStopped) then
        begin
             mapX:=(x div _cellSize)+1;
             mapY:=(y div _cellsize)+1;

             if (fCurrentMap[mapX,mapY]>0) then
                fCurrentMap[mapX,mapY]:=0
             else
                 fCurrentMap[mapX,mapY]:=1;

             drawCell(mapX,mapY);

             if (frmOverview.visible) then
                frmOverview.pbo.canvas.pixels[mapx-1,mapy-1]:=colors[fCurrentMap[mapx,mapy]];
        end;
end;


Now, we get instant visual feedback about our options and there is practically no screen flicker, so the job is complete as far as the editor is concerned. I suspect many of you choked as soon as I suggested the batch update approach, but this article isn't about getting straight to the optimal method... its about the thought processes that should be going on in your head when you consider how to optimise your code. I opted to try the batch update approach because I've seen some graphics tools use this method with great effect, but for my purposes, it was not the best way as it still resulted in screen flicker and didn't provide instant visual feedback. At that point, it wasn't my highest priority... I wanted to see the simulation run and the editor is a relatively minor part of that, so I decided to leave it as it was.

My options at that point were also restricted by the routines I had coded. Initially, the only routine that updated the canvas displaying the map was 'paintmap'. It wasn't until I added the 'drawcell' routine during my optimisation of the code that prodcues the next generation of cells that I went back and finished off optimising the editor. I could have coded it, but it was late, I was lazy and I could live without it... for a while anyway.

This does illustrate a very valuable point about optimisation. The process of optimising your software should have started even before you write the first line of code. Good design can often highlight potential problems before they occur, the more experience you get, the more potential pitfalls you may be able to avoid during the design phase.

Had I spent any time thinking about it before hand, its highly likely that I would have had a 'drawCell' routine in my original code and as such, probably wouldn't have had to worry about optimising the editor since it would make use of 'drawCell' in the original design.

Flexibility Vs. Speed

The rules of life are simple enough, but as I said earlier, I couldn't remember the exact values for each rule, so I cobbled the program together to use values provided by the user via edit boxes, but during my debugging process I ended up with a set of rules in the code that used hardcoded values instead of the variables.

Now, for some reason, I expected the hardcoded value rules to be slightly quicker than the variable version. This however was not the case. I noticed practically no difference. This isn't strictly related to optimisation, but it does illustrate the point that flexibility doesn't always mean slower code, but that depends on how you achieve the flexibility.

Well Coded Flexibility Vs. Badly Coded Flexibility

I actually went straight to what in this case can be considered well coded flexibility. To illustrate the point, I have deoptimised the code.

When making your programs flexible, you need to consider how the flexible items are used and perhaps most importantly of all, how often. Consider this block of code.

if (n>strToInt(edtCrowded.text)) then
   begin
        fCurrentMap[x,y]:=0;
        if (fOldMap[x,y]>0) then
           inc(deaths);
   end
else
    if (n=strToInt(edtNew.text)) then
       begin
            if (fOldMap[x,y]>0) then
               begin
                    if (fOldMap[x,y]<strToInt(edtOldAge.text)) then
                       begin
                            fCurrentMap[x,y]:=fOldMap[x,y]+1;
                            inc(alive);
                       end
                    else
                        begin
                             fCurrentMap[x,y]:=0;
                             inc(deaths);
                        end;
               end
            else
                begin
                     fCurrentMap[x,y]:=1;
                     inc(newcells);
                end;
       end
    else
        if (n<strToInt(edtLonely.text)) then
           begin
                fCurrentMap[x,y]:=0;
                if (fOldMap[x,y]>0) then
                   inc(deaths);
           end
        else
            if ((n>=strToInt(edtLiveLower.text)) and (n<=strToInt(edtLiveUpper.text))) and (fOldMap[x,y]>0) then
               begin
                    if (fOldMap[x,y]<strToInt(edtOldAge.text)) then
                       begin
                            inc(alive);
                            fCurrentMap[x,y]:=fOldMap[x,y]+1;
                       end
                    else
                        begin
                             inc(deaths);
                             fCurrentMap[x,y]:=0;
                        end;
               end
            else
                begin
                     fCurrentMap[x,y]:=0;
                end;


Ignore the fact that there is no validation on the edit boxes, the data in them was basically validated and then they were locked so that when this code got called, the values were guaranteed to be acceptable.

If this routine occured once, things wouldn't be too bad, but consider that with our basic life implementation with a map size of 282 x 282, it gets called 79524 times for each generation, things look a lot different. The overheads associated with converting the strings contained by the edit boxes are huge by comparison to everything else the routine does. With my original code (and no other optimisations) I could achieve around 3.5-4 generations per second, with the code above, it dropped to a measely 0.8 generations per second.

This is another instance that illustrates the value of good design and thinking ahead. Its all too easy to dive in without really thinking about what we're doing and before you know it, you have code that is far from optimal.

To see for yourself... disable fixedrules and optimisedvariablerules and compile or run Life_UnOptRules.exe, then compare that with the optimised version (enable optimisedvariablerules and compile or run Life_EditOpt2.exe).

The optimised version looks like this...

if (n>fCrowded) then
   begin
        fCurrentMap[x,y]:=0;
        if (fOldMap[x,y]>0) then
           inc(deaths);
   end
else
    if (n=fNew) then
       begin
            if (fOldMap[x,y]>0) then
               begin
                    if (fOldMap[x,y]<fOldAge) then
                       begin
                            fCurrentMap[x,y]:=fOldMap[x,y]+1;
                            inc(alive);
                       end
                    else
                        begin
                             fCurrentMap[x,y]:=0;
                             inc(deaths);
                        end;
               end
            else
                begin
                     fCurrentMap[x,y]:=1;
                     inc(newcells);
                end;
       end
    else
        if (n<fLonely) then
           begin
                fCurrentMap[x,y]:=0;
                if (fOldMap[x,y]>0) then
                   inc(deaths);
           end
        else
            if ((n>=fLiveLower) and (n<=fLiveUpper)) and (fOldMap[x,y]>0) then
               begin
                    if (fOldMap[x,y]<fOldAge) then
                       begin
                            inc(alive);
                            fCurrentMap[x,y]:=fOldMap[x,y]+1;
                       end
                    else
                        begin
                             inc(deaths);
                             fCurrentMap[x,y]:=0;
                        end;
               end
            else
                begin
                     fCurrentMap[x,y]:=0;
                end;


The values of the variables fCrowded, fLonely, fNew, fLiveLower, fLiveUpper and fOldAge are defined when the user clicks the start button. After that, the code is as fast as a hardcoded ruleset. Of course, this doesn't just apply to data you get from edit boxes. Reading an INI file or the registry everytime would impact performance, so the key is to think ahead... how will the code be used? and perhaps most importantly, how often will it be called?

Heres another example...

Consider an object that stores X and Y positional data. We require this object to be able to tell us its distance from the origin (0,0) at any time. During normal operation, we read an write its position very frequently and check its distance from the origin rarely.

type
  TMyPoint = class(TObject)
  private
    fX : integer;
    fY : integer;
    fDistanceToOrigin : real;
  protected
    procedure setX(value:integer);
    procedure setY(value:integer);
  public
    constructor create;

    property distanceToOrigin:real read fDistanceToOrigin;
  published
    property x:integer read fX write setX;
    property y:integer read fY write setY;
  end;

  ....

procedure TMyPoint.setX(value:integer);
begin
  if (value<>fX) then
    begin
      fX:=value;
      fDistanceToOrigin:=sqrt(sqr(fX)+sqr(fY));
    end;
end;

procedure TMyPoint.setY(value:integer);
begin
  if (value<>fY) then
    begin
      fY:=value;
      fDistanceToOrigin:=sqrt(sqr(fX)+sqr(fY));
    end;
end;

constructor TMyPoint.create;
begin
  inherited;

  fX:=0;
  fY:=0;
  fDistanceToOrigin:=0;
end;


For the scenario above where we read and write X and Y frequently but check DistanceToOrigin rarely, this code is far from optimal.

The key issue is the choices we've made about where to use property access routines. Since we are planning on reading and writing the X and Y properties frequently, we've given ourself a large execution time overhead whenever we write new values to X or Y, whilst giving instant access to distanceToOrigin. To be optimal for our scenario, the object should be defined as follows.

type
  TMyPoint = class(TObject)
  private
    fX : integer;
    fY : integer;
  protected
    function getDistanceToOrigin:real;
  public
    constructor create;

    property distanceToOrigin:real read getDistanceToOrigin;
  published
    property x:integer read fX write fX;
    property y:integer read fY write fY;
  end;

  ....

function TMyPoint.getDistanceToOrigin:real;
begin
  result:=sqrt(sqr(fX)+sqr(fY));
end;

constructor TMyPoint.create;
begin
  inherited;

  fX:=0;
  fY:=0;
end;


This version will provide optimal execution speed given our scenario.

It is, I hope, now reverberating in your ears... think ahead, design your software.

Where Are We At?

At this point, we have a nicely optimised editor and we have a flexible ruleset that doesn't impact performance, but it is still slow and suffering badly from screen flicker when the simluation is running.

Screen Update Optimisation

The first problem to tackle is the screen flicker when the simluation is running. This is caused by the fact that the paint box blats the whole screen with white and then goes on to draw each cell in turn.

This is acceptable when painting the whole control, since its likely that the only reason we're redrawing the whole control is because another window has been placed over us and now its gone. So, in this instance, we have no choice but to redraw the whole thing, but when the simulation is running, we end up with major screen flicker.

The routine that calculates the next generation (TfrmMain.nextGen) initially looked like this.

procedure TfrmMain.nextGen;
// variable declarations
begin
     // Variable initialisation
     repeat
             x:=1;
             repeat
                 // Calculate the next generation cells here

                 inc(x);
             until (x>_mapX);
         inc(y);
     until (y>_mapY);

     paintMap;

     if frmOverview.visible then
        frmOverview.pbo.invalidate;

     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
end;


Earlier I mentioned the fact that I only introduced 'drawcell' when optimising another area. This is it. To stop the flicker, instead of calculating the whole map and then redrawing the whole thing with the paint method, I added the drawcell routine and had each cell updated as it was calculated.

procedure TfrmMain.nextGen;
// variable declarations
begin
     // Variable initialisation
     repeat
             x:=1;
             repeat
                 // Calculate the next generation cells here

                 drawCell(x,y);

                 if frmOverview.Visible then
                    begin
                         col:=fCurrentMap[x,y];
                         if (col>_maxCols) then
                            col:=_maxCols;

                         frmOverview.pbo.canvas.pixels[x-1,y-1]:=colors[col];
                    end;

                 inc(x);
             until (x>_mapX);
         inc(y);
     until (y>_mapY);
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
end;


You can check it out for yourself by enabling optimisedupdates1 and compiling or running Life_OptUpd1.exe. You will notice that screen flicker has stopped, but that it is still awfully slow.

One thing you will notice about Life is that a large portion of the screen will become empty and will remain that way... or, you will end up with perpetual structures that never die and never change. And why is this relevant?

At the moment, we are plotting EVERY cell as we calculate it, which would be fine if every cell changed with every generation, but it doesn't. This presents an opportunity to speed things up since we really only need to redraw the cells that have actually changed.

procedure TfrmMain.nextGen;
// variable declarations
begin
     // Variable initialisation
     repeat
             x:=1;
             repeat
                 // Calculate the next generation cells here

                 if (fOldMap[x,y]<>fCurrentMap[x,y]) then
                   begin
                     drawCell(x,y);

                     if frmOverview.Visible then
                       begin
                         col:=fCurrentMap[x,y];
                         if (col>_maxCols) then
                           col:=_maxCols;
                         frmOverview.pbo.canvas.pixels[x-1,y-1]:=colors[col];
                       end;
                   end;

                 inc(x);
             until (x>_mapX);
         inc(y);
     until (y>_mapY);
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
end;


To try this one out, enable optimisedupdates2 and compile or run life_optupd2.exe. I think you'll agree, this is significantly faster than the last one, but it still has some nasty overheads.

We already have a means of repainting the whole editor paintbox when we scroll for example, so why then are we redrawing the entire grid within the scrollbox for each generation? Its incredibly wasteful since most of the grid is off screen.

So consider some kind of clipping that stops the program drawing cells that aren't visible. Whilst it adds the overhead of two if...then's, this is miniscule compared with the time it takes to draw a cell and so, its a very good trade off.

procedure TfrmMain.nextGen;
// variable declarations
begin
     // Variable initialisation
     xClipMin:=scrollbox1.horzScrollBar.position div _cellSize;
     xClipMax:=xClipMin+(scrollbox1.width div _cellSize)+1;
     yClipMin:=scrollbox1.vertScrollBar.position div _cellSize;
     yClipMax:=yClipMin+(scrollbox1.height div _cellSize)+1;

     repeat
             x:=1;
             repeat
                 // Calculate the next generation cells here

                 if (fOldMap[x,y]<>fCurrentMap[x,y]) then
                   begin
                     if (x>=xClipMin) and (x<=xClipMax) then
                       begin
                         if (y>=yClipMin) and (y<=yClipMax) then
                           begin
                             drawCell(x,y);
                           end;
                       end;

                     if frmOverview.Visible then
                       begin
                         col:=fCurrentMap[x,y];
                         if (col>_maxCols) then
                           col:=_maxCols;
                         frmOverview.pbo.canvas.pixels[x-1,y-1]:=colors[col];
                       end;
                   end;

                 inc(x);
             until (x>_mapX);
         inc(y);
     until (y>_mapY);
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
end;


To check this version out, enable optimisedupdates3 and compile or run life_optupd3.exe. Now, with this final version we are only redrawing the cells that are visible on screen that have changed, and we have managed to step up from approximately 4 generations per second in life_editopt2 to approximately 25 generations per second in life_optupd3.

Considering that the changes we've made are minimal, thats quite a hefty increase. So what can you take from this... if you can, only redraw whats changed (this was achieved with optimisedupdates1 and optimisedupdates2) and only redraw whats visible to the user (this was achieved with the introduction of clipping with optimisedupdates3).

One Final Pointer

In more ways than one :-)

You may by now have noticed an option on the right hand side... 'Wraparound'. As the name suggests, this option turns grid wrap around on and off.

This is essentially a very simple operation that switches between two different versions of the function 'neighbours'. In the code, we have two versions... 'neighboursNoWrap' and 'neighboursWrap'. The most obvious way to use this checkbox is to do something like this...

 // In procedure TfrmMain.nextGen;

  if chkWrap.checked then
    n:=neighboursWrap(x,y)
  else
    n:=neighboursNoWrap(x,y);


Clearly, this if...then...else will occur 79524 times for each generation. Relatively speaking its not a massive overhead, but if you have a lot of conditionals like this, you could be wasting valuable cycles. So, here is a more elegant way, that removes the overhead of the 'if...then...else', and it goes something like this...

type
  TNeighbourFunction = function(x,y:integer):integer of object;

  TfrmMain = class(TForm)
  private
    fNeighbourFunction : TNeighbourFunction;
  public
    function neighboursNoWrap(x,y:integer):integer;
    function neighboursWrap(x,y:integer):integer;
  end;

  ....

  // In procedure TfrmMain.nextGen;
  n:=fNeighbourFunction(x,y);


The value of fNeighbourFunction can be set when the 'Start' button is clicked. After that, there is no need for the if...then. As stated above, with only a single comparison the time saving achieved is negligible, but if you have lots of conditionals such as this and you replaced them all with this event handling style code, you may be able to save an awful lot of time.

Conclusion

Optimisation is an iterative process that, unless you are really really good and incredibly lucky, you will undoubtedly have to get involved in. The key things you need to do when optimising your code are...

Think ahead... design your code. Google... try finding known issues relevant to your task, any hints from other people... if you know where some of the slow bits are you can avoid them

Code it... get down to business and write your code. If your design is detailed some of the things that slow your code down may already have been taken into account... if not, then consider things like repeat...until instead of for..do. Unroll smaller loops (See below for an example). Make sure that if you are using property access methods, that they are optimised for how they are used (as detailed in the example above) and above all, consider how your code is used and how often. A 1ms delay that occurs one or twice every minute isn't too bad, but if it occurs 100 times per frame, thats a tenth of a second your wasting and all of a sudden your frame rate is down to 10fps. Think ahead.

Evaluate... test your code... find the slow bits by profiling the code with simple timing routines or if you want more precise information, get yourself a code profiler and use that.

Adapt and learn... when you find the slow parts, try and find out why they are slow... could be you pickup some valuable ideas along the way. If you can speed the slow parts up, do so and then re-evaluate. If not, consider alternative designs that may bypass the requirement for the slow code.

This obviously ties in nicely with iterative development processes and as such should be included in any project planning.

Unrolling a loop

for loop:=1 to 10 do
  x[loop]:=0;


Becomes....

x[1]:=0;
x[2]:=0;
x[3]:=0;
x[4]:=0;
x[5]:=0;
x[6]:=0;
x[7]:=0;
x[8]:=0;
x[9]:=0;
x[10]:=0;


Longer to write, but quicker to execute. Obviously there is a limit where the codespace increase/speed increase trade off may not be worth it, but for small loops, its well worth it, especially if they are called a lot.

And there we have it, a brief introduction to some of the basics of optimisation. This is in no way a definitive guide to optimisation. Its an introduction to some of the thought processes that should go on in your head when you are faced with a program that is less than speedy or when you are designing a new piece of software.

Thanks for reading. If you have any comments about the article, feel free to post them or email me.
Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by WikkaWiki
 
 
Page was generated in 0.3414 seconds