PDA

View Full Version : OpenGL optimizing with arrays



User137
22-08-2007, 10:23 AM
I made a small test application with my GLEngine. Using Vertex arrays, display lists and raw rendering using glBegin-glEnd. Take this code for tutorial or such as it's heavily commented. Vertex arrays method made my windowed 1024x768 mode render over 20000 triangles by over 114fps whereas other 2 methods remained at ~94fps. Surprisingly displaylist did not beat pre-defined raw rendering.
Screenshot: http://i10.tinypic.com/6euqii1.jpg

GLEngine is available at http://www.freewebs.com/loknar
but it hasn't been updated on the web for maybe long time, something i wish to do hopefully soon. Maybe it can run this demo however.
edit: I did update the website with new version :)

unit Unit1;

interface

uses
Windows, Messages, SysUtils, Classes, Graphics, Controls, Forms,
Dialogs, GLEngine, dglOpenGL, TLibUnit, TLib3D;

type
TForm1 = class(TForm)
GLEngine1: TGLEngine;
GLTimer1: TGLTimer;
procedure FormCreate(Sender: TObject);
procedure GLTimer1Timer(Sender: TObject; Lag: Single);
procedure GLEngine1Initialize(Sender: TObject);
procedure GLEngine1MouseDown(Sender: TObject; Button: TMouseButton;
Shift: TShiftState; X, Y: Integer);
private
list,vObj,cObj: cardinal;
vArray,cArray: array[0..4319] of P3f; // 38*60*2 vertices
vCount,mode: integer;
public
procedure vaSphereVertex(ax,ay: single);
procedure SphereVertex(ax,ay: single);
end;

var Form1: TForm1;

implementation

{$R *.dfm}

function BUFFER_OFFSET(i: cardinal): Pointer;
var ptr: pointer absolute i;
begin
result:=ptr;
end;

procedure TForm1.SphereVertex(ax,ay: single);
var z: single;
begin
// Raw rendering for vertex
z:=abs(cos(ay));
glColor3f(sin(ax)*0.5+0.5, sin(ay)*0.5+0.5, cos(ax)*0.5+0.5);
glVertex3f(cos(ax)*z,sin(ay),sin(ax)*z);
end;

procedure TForm1.vaSphereVertex(ax,ay: single);
var z: single;
begin
// Adding vertex to the vertex array
z:=abs(cos(ay));
vArray[vCount].x:=cos(ax)*z;
vArray[vCount].y:=sin(ay);
vArray[vCount].z:=sin(ax)*z;
cArray[vCount].x:=sin(ax)*0.5+0.5;
cArray[vCount].y:=sin(ay)*0.5+0.5;
cArray[vCount].z:=cos(ax)*0.5+0.5;
inc(vCount);
end;

procedure TForm1.GLTimer1Timer(Sender: TObject; Lag: Single);
var i,j,n: integer; s: string;
begin
GLEngine1.ClearScreen; // Clear screen

// Set view
glLoadIdentity;
glTranslatef(0,0,-4);
glRotatef(gettickcount/30,1,0,0);
glRotatef(gettickcount/17,0,1,0);

// Render same sphere in 20 different positions
for n:=1 to 20 do begin
glTranslatef(sin(n*0.37)*0.16,cos(n*0.37)*0.16,0);

// Rotate colors
for i:=0 to vCount-1 do
cArray[i].x:=0.5+0.1*sin((i div 240)*integer(GetTickCount)/10*toRad);

if mode in [0,3] then begin
// Render vertex array OR vertex buffer object
glEnableClientState(GL_VERTEX_ARRAY);
glEnableClientState(GL_COLOR_ARRAY);

glDrawArrays(GL_TRIANGLE_STRIP,0,vCount); // Render

glDisableClientState(GL_VERTEX_ARRAY);
glDisableClientState(GL_COLOR_ARRAY);

end else if mode=1 then begin
// Render displaylist
glCallList(list);

end else if mode=2 then begin
// Raw rendering
glBegin(GL_TRIANGLE_STRIP);
for j:=0 to vCount-1 do begin
glColor3fv(@cArray[j]); glVertex3fv(@vArray[j]);
end;
glEnd;
end;

end; // for

GLEngine1.Flip; // Flip screen

// Show current mode in caption
if mode=0 then s:='VertexArray'
else if mode=1 then s:='DisplayList'
else if mode=2 then s:='BeginEnd'
else s:='VertexBufferObject';
caption:=s+format(' - FPS: %d',[GLTimer1.FPS]); // Show fps
end;

procedure TForm1.FormCreate(Sender: TObject);
begin
GLEngine1.Align:=alClient; // Stretch GLEngine to form
end;

procedure TForm1.GLEngine1Initialize(Sender: TObject);
var i,j: integer;
begin
// Generate displaylist
list:=glGenLists(1);
glNewList(list,GL_COMPILE);
glBegin(GL_TRIANGLE_STRIP);
for j:=-18 to 17 do
for i:=0 to 59 do begin
// Make lower and upper triangle-strip vertex for displaylist
SphereVertex(i*6*toRad,j*5*toRad);
SphereVertex(i*6*toRad,(j+1)*5*toRad);

// Make lower and upper triangle-strip vertex for vertex array
vaSphereVertex(i*6*toRad,j*5*toRad);
vaSphereVertex(i*6*toRad,(j+1)*5*toRad);
end;
glEnd;
glEndList;

// Set vertex and color arrays
glVertexPointer(3,GL_FLOAT,0,@vArray);
glColorPointer(3,GL_FLOAT,0,@cArray);

// Create Vertex Buffer Object
glGenBuffersARB(1, @vObj); // Make VBO list for vertices
// Bind and load vertex data
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vObj);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, vCount*sizeof(single)*3,
@vArray, GL_STATIC_DRAW_ARB);

glGenBuffersARB(1, @cObj); // Make VBO list for colors
// Bind and load color data
glBindBufferARB(GL_ARRAY_BUFFER_ARB, cObj);
glBufferDataARB(GL_ARRAY_BUFFER_ARB, vCount*sizeof(single)*3,
@cArray, GL_STATIC_DRAW_ARB);
end;

procedure TForm1.GLEngine1MouseDown(Sender: TObject; Button: TMouseButton;
Shift: TShiftState; X, Y: Integer);
begin
mode:=(mode+1) mod 4;
if mode=0 then begin
// Reset vertex and color pointers when coming back
// to VertexArray mode (else the VBO data would be used)
glBindBufferARB(GL_ARRAY_BUFFER_ARB, 0);
glVertexPointer(3, GL_FLOAT, 0, @vArray);
glColorPointer(3, GL_FLOAT, 0, @cArray);
end else if mode=3 then begin
// Set vertex and color pointers to VBO
glBindBufferARB(GL_ARRAY_BUFFER_ARB, vObj);
glVertexPointer(3, GL_FLOAT, 0, BUFFER_OFFSET(0));
glBindBufferARB(GL_ARRAY_BUFFER_ARB, cObj);
glColorPointer(3, GL_FLOAT, 0, BUFFER_OFFSET(0));
end;
end;

end.

Fixed typos. Lastly added VBO.

VilleK
22-08-2007, 10:44 AM
That's interesting. I would also thought that displaylists were faster. What kind of graphics card do you use?

User137
22-08-2007, 11:13 AM
I use Radeon 9200, P3 933MHz, 512 sdram.

edit: Also note that raw rendering would be twice slower if i used Spherevertex function instead of vertex array's precalculated ones. Often raw rendering does indeed need some calculations per frame.

pstudio
22-08-2007, 01:01 PM
Doesn't displaylist only give a possible speed gain if you have to draw the same object more than once?

User137
22-08-2007, 01:18 PM
Thanks for good point, put an even heavier stress on graphics this time, for looping the same render for 5 times, totaling 108000 triangles:

vertex array 23fps
displaylist 18fps
Raw rendering 15fps

I let each animate for 10 seconds to stabilize rates, so yeah displaylist did manage bit better now. Doesn't change the fact vertex arrays rule 8)

JernejL
22-08-2007, 04:40 PM
display lists never beat VBO or vertex arrays.

pstudio
22-08-2007, 07:58 PM
Yeah - I would like to see you try out VBO as well for a final comparison.

User137
22-08-2007, 09:57 PM
You made me actually do it :P I've never dealt with VBO's before but with help of OpenGL sites and nehe it came real and finally understood it somehow... and the results you may see for yourself this time:
http://www.freewebs.com/loknar/ArrayTest.zip
Mouse clicking will switch between 4 different rendering modes.

tip: VBO's beat vertexarrays, though their nature is like with Displaylists - non dynamic so color rotating doesn't apply.

I'll update the first post code next for showing VBO's and how to switch between them and other rendering styles. I figured VBO arrays bind bit same way as textures... as it's video memory we are speaking of.

This demo renders 4320 triangles per sphere 20 times a frame = 86400 triangles per frame.

Oh, thx for BUFFER_OFFSET() :wink: That thread became useful.

Sascha Willems
22-08-2007, 11:43 PM
If used correct, display lists are still the fastest way to display geometry in OpenGL. (See e.g. the slides and programming guides provided by NVidia)
That's simply due to the fact that the driver can optimize them like no other type of geometry data. It can resort them, it can remove duplicate calls and much more. VBOs are second on the list.

Also your statement that VBOs are non-dynamic is wrong. You can specify different types of VBO, and if you use e.g. DYNAMIC_DRAW_ARB for your VBOs you can modify them every frame without sacrificing too much speed.

And if your display lists are slower than e.g. vertexarrays you have a bottleneck somewhere else. Note that the batch size is very important for rendering geometry, so your problem may be with wrong batchsizes where the overhead of a displaylist will kill any performancegains. It's also not realistic to put only vertices into a displaylists, as in reality you have texture bindings, different shaders and vertexparameters for your objects. And ~90k triangles per frame isn't that much for modern cards.

Mirage
23-08-2007, 05:34 PM
I did this kind of test also. Tested modes were raw (glbegin..glend), arrays (glDrawArrays) and display lists. In addition I compared it with vertex buffers-based rendering through DirectX.
In my tests (~16000 vertices, triangle list with !32000 trianlgles) raw mode and arrays has almost same performance.
Display lists had different behaviour on various hardware. On Radeon 9600Pro with DL's I got same performance as with DX (which was the fastest everywhere). On TnT2 a little slower than DX and about 0.5 second (!) to rebuild a list. At i815 was slow as raw mode and much slower then DX but lists rebuild was relatively fast. And so on.
So I will use VBO in OpenGL as it (probably) has the same performance as DX's vertex buffers and I understand how it works.

Luuk van Venrooij
23-10-2007, 02:33 PM
like Sascha allready pointed out, displaylist are still the fastest way to render static geometry.

User137
24-10-2007, 01:03 AM
I've tested also with rendering textured and normal coords having models put in vertex array and then in displaylist. Difference was not very noticable but displaylist was slightly faster there. Although there is drawback that displaylist won't be dynamic and will consume more memory than if using pointers to data.

Also read elsewhere that glDrawElements or glDrawRangeElements are faster than glDrawArrays, also makes vertex indexing possible leading to even less memory use.

Andreaz
25-10-2007, 05:16 AM
I've tested also with rendering textured and normal coords having models put in vertex array and then in displaylist. Difference was not very noticable but displaylist was slightly faster there. Although there is drawback that displaylist won't be dynamic and will consume more memory than if using pointers to data.

Also read elsewhere that glDrawElements or glDrawRangeElements are faster than glDrawArrays, also makes vertex indexing possible leading to even less memory use.

Problem with vertex indexing is that you cant have different texture coords or normal for the same vertex (think of a cube, only 8 vertices, but they all need 3 sets of tex coords and normals each), thus any memory gain is lost as you need to unshare the vertices.

They are faster through.

Display lists are a tricky subject through, if you're not using any state changes in the list (glBindTexture for instance) they might end up as a VBO instead of a displaylist, thus they are really hard to test for speed in that sence, and thats why its speed can vary so much between different gpu cards and drivers.

DYNAMIC_DRAW_ARB doesnt make a vertex array more dynamic then GL_STATIC_DRAW, it's just a hint for the driver that the array will be modifyed, it has the option, noting mandatory to put the array in different memory kinds dependig on the hint. (For instance GL_STATIC_DRAW migh end up in the GPU memory, but it could end up in the ram or even the cache).

The fastest metod i'm avare of is to put all your different meshes and models in one VBO, then using index buffers to render the various models using a vertex shader for the transformation of the objects. Ofcourse this requires a card with shading support.