Faster Rendering Ideas

**Orgun109uk** · 04-06-2009, 04:46 PM

Hi all,

I am in the process of writing an engine for a possible future MMO project, and trying to think of ways to speed up the processing and rendering times.

I have everything stored in TList classes, which works great when i have a few hundred model instances, but when i start to get in to the thousands its starts to take its toll.

The engine at the moment can handle 5,000 model instances (712 Polygons each) with out to much of a speed impact, even with GL Picking. However, things start to seriously slow down as i reach the 10,000 and 20,000+ regions.

I plan on using an octree, but since the game may have around 100,000+ instances in the view at a time, this wont help too much.

Does anyone have suggestions on speeding things up?

Can i use VBO commands in a glList? and if i can, can i use vector pointers in opengl to update the positions and rotations?

Many Thanks

**NecroDOME** · 04-06-2009, 05:17 PM

You could use some quadtree/octree optimizations ( http://en.wikipedia.org/wiki/Octree ). This way you only access the objects that are visible to the player.

When objects are not visible and not nearby (say a few miles away) you don't see them, so you don't have to load them. You can load them in separate threads as you like to not stall your game. They call it content streaming. So the best optimization is just to load only the things that you can see or interact with.

**Orgun109uk** · 04-06-2009, 07:25 PM

Hi NecroDOME, thanks for the reply.

I am currently adding 2 octrees, one for the units and the other for the scene/world. However currently this only increases the number of loops and slows the processing and rendering down even more.

I can render around 20,000 objects (in view) comfortably at the moment, but this will increase to around 100,000 objects (in view).

Hope this makes sense...

Basically the game will be an MMO-RTS sort of game, so a player could have thousands of units in view, and structures, etc. Not including other players units coming into view.

**chronozphere** · 04-06-2009, 09:52 PM

For rendering, I'd use frustum culling. It's very easy and accurate.

And batching also helps alot. Make sure that you group your render calls as much as possible to reduce the ammount of state changes (changing model, texture, shader etc).

Whenever possible, try to render the stuff in front first, because it reduces the ammount of overdraw. Overdraw can slow things down considerably, especially when you are using fancy shaders. Deferred rendering could help here, but you would probably need to re-design your render-module, if you want to use that.

**Traveler** · 04-06-2009, 09:56 PM

This is not exactly an answer to your question and I know I do not have any details about the game but, I do wonder if such numbers are really necessary? I realize that being able to render such an enormous amount, sounds very cool from a technical point of view, as a gamer I probably would have a lot of problems making out what is happening, let alone controlling (all) these units.

I do hope you have considered the gameplay element that involves these numbers as well

**jdarling** · 05-06-2009, 04:55 AM

I have to agree with Traveler that your numbers seem very over the top. Aside from that though, and to answer your question, why are you using TList?

TList is great for quick fixes that require small amounts of data and (possibly) medium access speeds. To me a BSP, Trie, or DLL (Doubly Linked List) would be much better options. Heck, even a bucket list would be faster than a TList with that many elements.

When building your list structure take into account WHAT your wanting to do with the data. Thus if you need more than one "child/neighbor" pointer make proper optimizations for this.

As for rendering, a highly optimized list structure will aid in rendering speed by lowering your poly count quickly down to ONLY visible models. From here you can use any of the web articles or suggestions about limiting the actual rendered poly (back culling) count down thus speeding things up quite a bit.

- Jeremy

**NecroDOME** · 05-06-2009, 10:11 AM

To optimize speed speed you can what chrono said, use batching. If you have a lot of objects you need to render you could dump them all to 1 vertex array and dump that with one render call to the screen.
Or you could use instancing. Use 1 vertex array and render that to the screen several times using one render call.

Then again: 100,000 objects * ~750 triangles would be 75.000.000 triangles to render. That combined with with some optimazations lets say you have only 25% visible = 18.750.000. It will run around 1-5 fps

. Not very realistic.

However if you want to achieve this, you should considering using sprites. See this article about true impostors: http://http.developer.nvidia.com/GPU...ems3_ch21.html

EDIT: Would it be possible to make only one octree and minimize the overhead of 2??

**Andreaz** · 05-06-2009, 01:04 PM

Originally Posted by jdarling

TList is great for quick fixes that require small amounts of data and (possibly) medium access speeds. To me a BSP, Trie, or DLL (Doubly Linked List) would be much better options. Heck, even a bucket list would be faster than a TList with that many elements.

Actually, theres nothing slow about TList as such, as long as you avoid using the getter functions the amount of time taken to extract an element is really low, and for just stepping through the list it could be faster then a Linked list due to the element pointers being better aligned in the cache.

Only benefints between a DLL and a TList is removal of elements, its O(N) in a TList and O(1) in a DDL. Insertion (not adding to the end) is possible faster in a linked list, however you have to cache the list nodes so you dont have to create them all the time.

BSP/QuadTree/Octree are a whole different deal, but it is on a much higher level then TList vs LL or DLL as its a broadphase culling strategy.

So to summary to loop over a list without any list overhead (as fast as using a array) do like this:

[pascal]
for Index:=0 to List.Count-1 do
begin
Item:= TItem( List.List^[Index] );
// do something with Item
end;
[/pascal]
Back to topic, 100 000 visible objects seems alot, how would you even be able to see them at once on the screen?

If you are targeting top of the line hardware you might be able to push those numbers with hardware instancing but it is still a lot.

Some more information on what you are trying to accomplish would make it easier to see what you are trying to do and give better tips!

**jdarling** · 05-06-2009, 03:30 PM

Originally Posted by Andreaz

Actually, theres nothing slow about TList as such, as long as you avoid using the getter functions the amount of time taken to extract an element is really low, and for just stepping through the list it could be faster then a Linked list due to the element pointers being better aligned in the cache.

<Sarcasym>
Yeah, because everyone who is using a TList isn't using the accessor methods. Why would they use the interface provided.
</Sarcasym>

At the point you describe, you might as well just use a dynamic array with buffer caching and a length keeper. Then you loose the overhead of the object as a bonus

Other benefits of (non-TList) are faster item move, faster inserts (if using node cache), faster sorts... I can go on, but most are easily found in books. Though I'm sure Borland/CodeGear has made improvements in the base code (I only have Lazarus and D6 to work with).

Originally Posted by Andreaz

BSP/QuadTree/Octree are a whole different deal, but it is on a much higher level then TList vs LL or DLL as its a broadphase culling strategy.

My point was to show progressing from a simple easy to implement concept (using a TList to manage data) to a complex structure that is highly optimized (insert here). With that many objects a broad aproach may work out quite well.

BTW: One thing I haven't seen answered yet is the question of what rendering engine(s) your using. We all took for granted that is OGL only, but are you planning on using other rendering engines? If so, you need to apply a bit more thought on the subject if you want a general solution.

**Orgun109uk** · 05-06-2009, 04:33 PM

Hi, thanks for all the replies,

The amount of objects is really everything that gets rendered, including the terrain, particles, models, billboards, GUI, etc, etc. But i've decided to split this up.

To explain the project a bit better, its kinda like an Online RTS (Online Command and Conquer), you can build your own army to "take over the galaxy" or join a clan and create an even bigger army (which is why i am aiming for such i high number of objects).

I was thinking of using 2 octree's mainly to split up whats interactive and whats not, so when an object is clicked the scenery is not even included in the opengl picking stage, as well as to shrink the amount of processing is done on each object (e.g. i don't need to know what state a tree is in, but i do need to know the tanks state).

The models are stored in a data storage list rather than the object list, and im using a "mesh Instance" class, which renders the models using VBO (which gets generated when the model data is loaded).

After some testing of how "many models i can render on the screen". I'm thinking i'll stick around the 10,000 (max) mark, and perhaps 100,000 is just a tad unrealistic and unnecessary

.

The deferred rendering is a good idea, thanks, completely over looked it

.

I prefer using lists over arrays, especially when it will constantly be updated (adding and deleting), just makes things a little more simpler. Although if using TList is slower in Lazarus than arrays, i will need to look more into dynamic arrays.

Sorry, must of forgot to mention, i only plan to use SDL and OpenGL.