• Recent Tutorials

  • Tripping The Class Fantastic: Basic Optimization

    Flexibility vs. Speed

    The rules of life are simple enough, but as I said earlier, I couldn't remember the exact values for each rule, so I cobbled the program together to use values provided by the user via edit boxes, but during my debugging process I ended up with a set of rules in the code that used hard-coded values instead of the variables.

    Now, for some reason, I expected the hard-coded value rules to be slightly quicker than the variable version. This however was not the case. I noticed practically no difference. This isn't strictly related to optimization, but it does illustrate the point that flexibility doesn't always mean slower code, but that depends on how you achieve the flexibility.

    Well Coded Flexibility Vs. Badly Coded Flexibility
    I actually went straight to what in this case can be considered well coded flexibility. To illustrate the point, I have deoptimized the code.

    When making your programs flexible, you need to consider how the flexible items are used and perhaps most importantly of all, how often. Consider this block of code.

    Code:
    if (n>strToInt(edtCrowded.text)) then
    begin
     fCurrentMap[x,y]:=0;
     if (fOldMap[x,y]>0) then
     begin
      inc(deaths);
     end;
    end
    else
    begin
     if (n=strToInt(edtNew.text)) then
     begin
      if (fOldMap[x,y]>0) then
      begin
       if (fOldMap[x,y]<strToInt(edtOldAge.text)) then
       begin
        fCurrentMap[x,y]:=fOldMap[x,y]+1;
        inc(alive);
       end
       else
       begin
        fCurrentMap[x,y]:=0;
        inc(deaths);
       end;
      end
      else
      begin
       fCurrentMap[x,y]:=1;
       inc(newcells);
      end;
     end
     else
     begin
      if (n<strToInt(edtLonely.text)) then
      begin
       fCurrentMap[x,y]:=0;
       if (fOldMap[x,y]>0) then
       begin
        inc(deaths);
       end
       else
       begin
        if ((n>=strToInt(edtLiveLower.text)) and (n<=strToInt(edtLiveUpper.text))) and (fOldMap[x,y]>0) then
        begin
         if (fOldMap[x,y]<strToInt(edtOldAge.text)) then
         begin
          inc(alive);
          fCurrentMap[x,y]:=fOldMap[x,y]+1;
         end
         else
         begin
          inc(deaths);
          fCurrentMap[x,y]:=0;
         end;
        end
        else
        begin
         fCurrentMap[x,y]:=0;
        end;
       end;
      end;
     end;
    end;
    Ignore the fact that there is no validation on the edit boxes, the data in them was basically validated and then they were locked so that when this code got called, the values were guaranteed to be acceptable.

    If this routine occurred once, things wouldn't be too bad, but consider that with our basic life implementation with a map size of 282 x 282, it gets called 79524 times for each generation, things look a lot different. The overheads associated with converting the strings contained by the edit boxes are huge by comparison to everything else the routine does. With my original code (and no other optimizations) I could achieve around 3.5-4 generations per second, with the code above, it dropped to a measly 0.8 generations per second.

    This is another instance that illustrates the value of good design and thinking ahead. Its all too easy to dive in without really thinking about what we're doing and before you know it, you have code that is far from optimal.

    To see for yourself... disable fixedrules and optimisedvariablerules and compile or run Life_UnOptRules.exe, then compare that with the optimized version (enable optimisedvariablerules and compile or run Life_EditOpt2.exe).

    The optimised version looks like this...

    Code:
    if (n>fCrowded) then
    begin
     fCurrentMap[x,y]:=0;
     if (fOldMap[x,y]>0) then
     begin
      inc(deaths);
     end;
    end
    else
    begin
     if (n=fNew) then
     begin
      if (fOldMap[x,y]>0) then
      begin
       if (fOldMap[x,y]<fOldAge) then
       begin
        fCurrentMap[x,y]:=fOldMap[x,y]+1;
        inc(alive);
       end
       else
       begin
        fCurrentMap[x,y]:=0;
        inc(deaths);
       end;
      end
      else
      begin
       fCurrentMap[x,y]:=1;
       inc(newcells);
      end;
     end
     else
     begin
      if (n<fLonely) then
      begin
       fCurrentMap[x,y]:=0;
       if (fOldMap[x,y] > 0) then
       begin
        inc(deaths);
       end;
      end
      else
      begin
       if ((n>=fLiveLower) and (n<=fLiveUpper)) and (fOldMap[x,y]>0) then
       begin
        if (fOldMap[x,y]<fOldAge) then
        begin
         inc(alive);
         fCurrentMap[x,y]:=fOldMap[x,y]+1;
        end
        else
        begin
         inc(deaths);
         fCurrentMap[x,y]:=0;
        end;
       end
       else
       begin
        fCurrentMap[x,y]:=0;
       end;
      end;
     end;
    end;
    The values of the variables fCrowded, fLonely, fNew, fLiveLower, fLiveUpper and fOldAge are defined when the user clicks the start button. After that, the code is as fast as a hard-coded rule-set. Of course, this doesn't just apply to data you get from edit boxes. Reading an INI file or the registry every time would impact performance, so the key is to think ahead... how will the code be used? and perhaps most importantly, how often will it be called?

    Here's another example...

    Consider an object that stores X and Y positional data. We require this object to be able to tell us its distance from the origin (0,0) at any time. During normal operation, we read an write its position very frequently and check its distance from the origin rarely.

    Code:
    type
     TMyPoint = class(TObject)
     private
      fX : integer;
      fY : integer;
      fDistanceToOrigin : real;
     protected
      procedure setX(value:integer);
      procedure setY(value:integer);
     public
      constructor create;
    
      property distanceToOrigin:real read fDistanceToOrigin;
     published
      property x:integer read fX write setX;
      property y:integer read fY write setY;
     end;
    
      ....
    
    procedure TMyPoint.setX(value:integer);
    begin
     if (value<>fX) then
     begin
      fX:=value;
      fDistanceToOrigin:=sqrt(sqr(fX)+sqr(fY));
     end;
    end;
    
    procedure TMyPoint.setY(value:integer);
    begin
     if (value<>fY) then
     begin
      fY:=value;
      fDistanceToOrigin:=sqrt(sqr(fX)+sqr(fY));
     end;
    end;
    
    constructor TMyPoint.create;
    begin
     inherited;
    
     fX:=0;
     fY:=0;
     fDistanceToOrigin:=0;
    end;
    For the scenario above where we read and write X and Y frequently but check DistanceToOrigin rarely, this code is far from optimal.

    The key issue is the choices we've made about where to use property access routines. Since we are planning on reading and writing the X and Y properties frequently, we've given our-self a large execution time overhead whenever we write new values to X or Y, whilst giving instant access to distanceToOrigin. To be optimal for our scenario, the object should be defined as follows.

    Code:
    type
     TMyPoint = class(TObject)
     private
      fX : integer;
      fY : integer;
     protected
      function getDistanceToOrigin:real;
     public
      constructor create;
    
      property distanceToOrigin:real read getDistanceToOrigin;
     published
      property x:integer read fX write fX;
      property y:integer read fY write fY;
     end;
    
      ....
    
    function TMyPoint.getDistanceToOrigin:real;
    begin
     result:=sqrt(sqr(fX)+sqr(fY));
    end;
    
    constructor TMyPoint.create;
    begin
     inherited;
    
     fX:=0;
     fY:=0;
    end;
    This version will provide optimal execution speed given our scenario.

    It is, I hope, now reverberating in your ears... think ahead, design your software.

    Screen Update OptimisationWhere Are We At?
    At this point, we have a nicely optimized editor and we have a flexible rule-set that doesn't impact performance, but it is still slow and suffering badly from screen flicker when the simulation is running.

    Screen Update Optimization
    The first problem to tackle is the screen flicker when the simulation is running. This is caused by the fact that the paint box blats the whole screen with white and then goes on to draw each cell in turn.

    This is acceptable when painting the whole control, since its likely that the only reason we're redrawing the whole control is because another window has been placed over us and now its gone. So, in this instance, we have no choice but to redraw the whole thing, but when the simulation is running, we end up with major screen flicker.

    The routine that calculates the next generation (TfrmMain.nextGen) initially looked like this.

    Code:
    procedure TfrmMain.nextGen;
    // variable declarations
    begin
     // Variable initialisation
     repeat
      x:=1;
      repeat
       // Calculate the next generation cells here
    
       inc(x);
      until (x>_mapX);
      inc(y);
     until (y>_mapY);
    
     paintMap;
    
     if frmOverview.visible then
     begin
      frmOverview.pbo.invalidate;
     end;
     
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
    end;
    Earlier I mentioned the fact that I only introduced 'drawcell' when optimizing another area. This is it. To stop the flicker, instead of calculating the whole map and then redrawing the whole thing with the paint method, I added the drawcell routine and had each cell updated as it was calculated.

    Code:
    procedure TfrmMain.nextGen;
    // variable declarations
    begin
     // Variable initialisation
     repeat
      x:=1;
      repeat
       // Calculate the next generation cells here
    
       drawCell(x,y);
    
       if frmOverview.Visible then
       begin
        col:=fCurrentMap[x,y];
        if (col>_maxCols) then
        begin
         col:=_maxCols;
        end;
    
        frmOverview.pbo.canvas.pixels[x-1,y-1]:=colors[col];
       end;
    
       inc(x);
      until (x>_mapX);
      inc(y);
     until (y>_mapY);
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
    end;
    You can check it out for yourself by enabling optimisedupdates1 and compiling or running Life_OptUpd1.exe. You will notice that screen flicker has stopped, but that it is still awfully slow.

    One thing you will notice about Life is that a large portion of the screen will become empty and will remain that way... or, you will end up with perpetual structures that never die and never change. And why is this relevant?

    At the moment, we are plotting EVERY cell as we calculate it, which would be fine if every cell changed with every generation, but it doesn't. This presents an opportunity to speed things up since we really only need to redraw the cells that have actually changed.

    Code:
    procedure TfrmMain.nextGen;
    // variable declarations
    begin
     // Variable initialisation
     repeat
      x:=1;
      repeat
       // Calculate the next generation cells here
       
       if (fOldMap[x,y]<>fCurrentMap[x,y]) then
       begin
        drawCell(x,y);
    
        if frmOverview.Visible then
        begin
         col:=fCurrentMap[x,y];
         if (col>_maxCols) then
         begin
          col:=_maxCols;
         end;
         frmOverview.pbo.canvas.pixels[x-1,y-1]:=colors[col];
        end;
       end;
    
       inc(x);
      until (x>_mapX);
      inc(y);
     until (y>_mapY);
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
    end;
    To try this one out, enable optimisedupdates2 and compile or run life_optupd2.exe. I think you'll agree, this is significantly faster than the last one, but it still has some nasty overheads.

    We already have a means of repainting the whole editor paintbox when we scroll for example, so why then are we redrawing the entire grid within the scrollbox for each generation? Its incredibly wasteful since most of the grid is off screen.

    So consider some kind of clipping that stops the program drawing cells that aren't visible. Whilst it adds the overhead of two if...then's, this is minuscule compared with the time it takes to draw a cell and so, its a very good trade off.

    Code:
    procedure TfrmMain.nextGen;
    // variable declarations
    begin
     // Variable initialisation
     xClipMin:=scrollbox1.horzScrollBar.position div _cellSize;
     xClipMax:=xClipMin+(scrollbox1.width div _cellSize)+1;
     yClipMin:=scrollbox1.vertScrollBar.position div _cellSize;
     yClipMax:=yClipMin+(scrollbox1.height div _cellSize)+1;
    
     repeat
      x:=1;
      repeat
       // Calculate the next generation cells here
    
       if (fOldMap[x,y]<>fCurrentMap[x,y]) then
       begin
        if (x>=xClipMin) and (x<=xClipMax) then
        begin
         if (y>=yClipMin) and (y<=yClipMax) then
         begin
          drawCell(x,y);
         end;
        end;
    
        if frmOverview.Visible then
        begin
         col:=fCurrentMap[x,y];
         if (col>_maxCols) then
         begin
          col:=_maxCols;
         end;
         frmOverview.pbo.canvas.pixels[x-1,y-1]:=colors[col];
        end;
       end;
    
       inc(x);
      until (x>_mapX);
      inc(y);
     until (y>_mapY);
     // Transfer current map to old map here
     // Update labels containing generation stats here
     // Update labels containing timing data here
    end;
    To check this version out, enable optimisedupdates3 and compile or run life_optupd3.exe. Now, with this final version we are only redrawing the cells that are visible on screen that have changed, and we have managed to step up from approximately 4 generations per second in life_editopt2 to approximately 25 generations per second in life_optupd3.

    Considering that the changes we've made are minimal, thats quite a hefty increase. So what can you take from this... if you can, only redraw whats changed (this was achieved with optimisedupdates1 and optimisedupdates2) and only redraw whats visible to the user (this was achieved with the introduction of clipping with optimisedupdates3).

    Comments 1 Comment
    1. Brainer's Avatar
      Brainer -
      I don't know whether it's faster, but how about this?
      Code:
      ZeroMemory(@X[0], Length(X) * SizeOf(DataType));