PDA

View Full Version : Vbo's



lithander
15-03-2004, 10:23 AM
As you probably kow the The OpenGL Extension GL_ARB_vertex_buffer_object allows to allocate memory on the v-ram. As there are many different possibilities to use this and unknown limmitations I was wondering how to use it most effectively. I found no satisfying resources on the web and so I started to write a tool (http://www.phobeus.de/hosting/pixelpracht/main.php?s=opengl&t=5&p=1) to help me evaluate the powers of VBOs.

As I have a Radeon 9700 in my system, I could only test this card. So I'd like to ask you to run the tool on your system and test what configuration fits best. Especially if you have Nvidia cards I'd be curious how the VBO-Implementation works.

As for the radeon I found some interesting Issues:

- If a VBO gets bigger than 32 Mb it won't be allocated in V-Ram but in the AGP-Ram. So you schould avoid making your VBO's to big.
- If you save you Vertices one after another (interleaved) you end up with a better performance than if you make a VBO for each component (pos, uv etc) though this comes in handy if a Vertexbuffer would exceed 32mb otherwise.
- A good way is a Vertexbuffer and an Indexbuffer. The Vertexchache seems big enough so you don't suffer a performance hit compared to a bunch of TriStrips.
- If Vertex-Size gets bigger than 32byte you suffer a performance drop.
- Texturing is pretty cheap while lightning limits the maximum throughput to (on my card) 30 MTris/s. (So lightning becomes the bottleneck with the consequence that Vertexformat and Drawmethod becomes rather unimportant performance-wise.)
- You can use hundreds of DrawCalls to draw a VBO without a big performance dropp. (Lets say 300 Triangle-Strips)
- Binding VBO's (unlike binding textures) seems to be rather cheap, too! So it might be a good idea to have a set of VBO's for every mesh instead of one big Vertex-Cache for all.

I found this information pretty valuable. Perhaps you can make some experiments on you systems and drop in your results!

-lith

Clootie
15-03-2004, 06:12 PM
Some comments:
1) IIRC VBO is analog of DirectX VertexBuffers, so same optimizations apply
2) If Vertex-Size gets bigger than 32byte you suffer a performance drop. - Have you tried 64byte vertices?

lithander
15-03-2004, 07:23 PM
1.) Oh I didn't know that :-/ Could have saved me a lot of time! Do you know any good resources that I could read?

2.) I compared GL_C4F_N3F_V3F (40byte) with GL_T2F_N3F_V3F (32byte) and some other smaller formats. Using Vertex/IndexBuffer the throughput reached about 90 MTri/s for bigger VBO's. When using the 40byte big Format I got never more than 75 MTris/s often very much below. (pretty unstable from test to test)
So I guessed that it had to do with the Vertex-Size being bigger 32byte. It might be though, that it's just an unusual format or whatever... any other theorys?

-lith

Clootie
16-03-2004, 06:26 PM
Comment about 64byte vertex size was about caching vertex data. In most GPU's vertices cached based on 32 byte cache line. So if you have pretty random access to your data (say due to progressive mesh adaptation) and you native format is 60bytes then it may be better to add some "garbage" and get 64byte vertex. But these are pretty corner cases - just for example.

In your examples you are using not just differently sized vertices but different vertices. So not only vertex size, but triangle setup can limit you in this case (it needs to setup interpolation for 2 additional vectors). Try to use the same vertex format with different stride between them.