PDA

View Full Version : Faster Rendering Ideas



Orgun109uk
04-06-2009, 04:46 PM
Hi all,

I am in the process of writing an engine for a possible future MMO project, and trying to think of ways to speed up the processing and rendering times.

I have everything stored in TList classes, which works great when i have a few hundred model instances, but when i start to get in to the thousands its starts to take its toll.

The engine at the moment can handle 5,000 model instances (712 Polygons each) with out to much of a speed impact, even with GL Picking. However, things start to seriously slow down as i reach the 10,000 and 20,000+ regions.

I plan on using an octree, but since the game may have around 100,000+ instances in the view at a time, this wont help too much.

Does anyone have suggestions on speeding things up?

Can i use VBO commands in a glList? and if i can, can i use vector pointers in opengl to update the positions and rotations?

Many Thanks

NecroDOME
04-06-2009, 05:17 PM
You could use some quadtree/octree optimizations ( http://en.wikipedia.org/wiki/Octree ). This way you only access the objects that are visible to the player.

When objects are not visible and not nearby (say a few miles away) you don't see them, so you don't have to load them. You can load them in separate threads as you like to not stall your game. They call it content streaming. So the best optimization is just to load only the things that you can see or interact with.

Orgun109uk
04-06-2009, 07:25 PM
Hi NecroDOME, thanks for the reply.

I am currently adding 2 octrees, one for the units and the other for the scene/world. However currently this only increases the number of loops and slows the processing and rendering down even more.

I can render around 20,000 objects (in view) comfortably at the moment, but this will increase to around 100,000 objects (in view).

Hope this makes sense...

Basically the game will be an MMO-RTS sort of game, so a player could have thousands of units in view, and structures, etc. Not including other players units coming into view.

chronozphere
04-06-2009, 09:52 PM
For rendering, I'd use frustum culling. It's very easy and accurate. :)

And batching also helps alot. Make sure that you group your render calls as much as possible to reduce the ammount of state changes (changing model, texture, shader etc).

Whenever possible, try to render the stuff in front first, because it reduces the ammount of overdraw. Overdraw can slow things down considerably, especially when you are using fancy shaders. Deferred rendering could help here, but you would probably need to re-design your render-module, if you want to use that.

Traveler
04-06-2009, 09:56 PM
This is not exactly an answer to your question and I know I do not have any details about the game but, I do wonder if such numbers are really necessary? I realize that being able to render such an enormous amount, sounds very cool from a technical point of view, as a gamer I probably would have a lot of problems making out what is happening, let alone controlling (all) these units.

I do hope you have considered the gameplay element that involves these numbers as well ;)

jdarling
05-06-2009, 04:55 AM
I have to agree with Traveler that your numbers seem very over the top. Aside from that though, and to answer your question, why are you using TList?

TList is great for quick fixes that require small amounts of data and (possibly) medium access speeds. To me a BSP, Trie, or DLL (Doubly Linked List) would be much better options. Heck, even a bucket list would be faster than a TList with that many elements.

When building your list structure take into account WHAT your wanting to do with the data. Thus if you need more than one "child/neighbor" pointer make proper optimizations for this.

As for rendering, a highly optimized list structure will aid in rendering speed by lowering your poly count quickly down to ONLY visible models. From here you can use any of the web articles or suggestions about limiting the actual rendered poly (back culling) count down thus speeding things up quite a bit.

- Jeremy

NecroDOME
05-06-2009, 10:11 AM
To optimize speed speed you can what chrono said, use batching. If you have a lot of objects you need to render you could dump them all to 1 vertex array and dump that with one render call to the screen.
Or you could use instancing. Use 1 vertex array and render that to the screen several times using one render call.

Then again: 100,000 objects * ~750 triangles would be 75.000.000 triangles to render. That combined with with some optimazations lets say you have only 25% visible = 18.750.000. It will run around 1-5 fps :) . Not very realistic.

However if you want to achieve this, you should considering using sprites. See this article about true impostors: http://http.developer.nvidia.com/GPUGems3/gpugems3_ch21.html

EDIT: Would it be possible to make only one octree and minimize the overhead of 2??

Andreaz
05-06-2009, 01:04 PM
TList is great for quick fixes that require small amounts of data and (possibly) medium access speeds. To me a BSP, Trie, or DLL (Doubly Linked List) would be much better options. Heck, even a bucket list would be faster than a TList with that many elements.

Actually, theres nothing slow about TList as such, as long as you avoid using the getter functions the amount of time taken to extract an element is really low, and for just stepping through the list it could be faster then a Linked list due to the element pointers being better aligned in the cache.

Only benefints between a DLL and a TList is removal of elements, its O(N) in a TList and O(1) in a DDL. Insertion (not adding to the end) is possible faster in a linked list, however you have to cache the list nodes so you dont have to create them all the time.

BSP/QuadTree/Octree are a whole different deal, but it is on a much higher level then TList vs LL or DLL as its a broadphase culling strategy.

So to summary to loop over a list without any list overhead (as fast as using a array) do like this:


for Index:=0 to List.Count-1 do
begin
Item:= TItem( List.List^[Index] );
// do something with Item
end;

Back to topic, 100 000 visible objects seems alot, how would you even be able to see them at once on the screen?

If you are targeting top of the line hardware you might be able to push those numbers with hardware instancing but it is still a lot.

Some more information on what you are trying to accomplish would make it easier to see what you are trying to do and give better tips!

jdarling
05-06-2009, 03:30 PM
Actually, theres nothing slow about TList as such, as long as you avoid using the getter functions the amount of time taken to extract an element is really low, and for just stepping through the list it could be faster then a Linked list due to the element pointers being better aligned in the cache.


<Sarcasym>
Yeah, because everyone who is using a TList isn't using the accessor methods. Why would they use the interface provided.
</Sarcasym>

At the point you describe, you might as well just use a dynamic array with buffer caching and a length keeper. Then you loose the overhead of the object as a bonus :)

Other benefits of (non-TList) are faster item move, faster inserts (if using node cache), faster sorts... I can go on, but most are easily found in books. Though I'm sure Borland/CodeGear has made improvements in the base code (I only have Lazarus and D6 to work with).




BSP/QuadTree/Octree are a whole different deal, but it is on a much higher level then TList vs LL or DLL as its a broadphase culling strategy.


My point was to show progressing from a simple easy to implement concept (using a TList to manage data) to a complex structure that is highly optimized (insert here). With that many objects a broad aproach may work out quite well.

BTW: One thing I haven't seen answered yet is the question of what rendering engine(s) your using. We all took for granted that is OGL only, but are you planning on using other rendering engines? If so, you need to apply a bit more thought on the subject if you want a general solution.

Orgun109uk
05-06-2009, 04:33 PM
Hi, thanks for all the replies,

The amount of objects is really everything that gets rendered, including the terrain, particles, models, billboards, GUI, etc, etc. But i've decided to split this up.

To explain the project a bit better, its kinda like an Online RTS (Online Command and Conquer), you can build your own army to "take over the galaxy" or join a clan and create an even bigger army (which is why i am aiming for such i high number of objects).

I was thinking of using 2 octree's mainly to split up whats interactive and whats not, so when an object is clicked the scenery is not even included in the opengl picking stage, as well as to shrink the amount of processing is done on each object (e.g. i don't need to know what state a tree is in, but i do need to know the tanks state).

The models are stored in a data storage list rather than the object list, and im using a "mesh Instance" class, which renders the models using VBO (which gets generated when the model data is loaded).

After some testing of how "many models i can render on the screen". I'm thinking i'll stick around the 10,000 (max) mark, and perhaps 100,000 is just a tad unrealistic and unnecessary ::).

The deferred rendering is a good idea, thanks, completely over looked it :-[.

I prefer using lists over arrays, especially when it will constantly be updated (adding and deleting), just makes things a little more simpler. Although if using TList is slower in Lazarus than arrays, i will need to look more into dynamic arrays.

Sorry, must of forgot to mention, i only plan to use SDL and OpenGL.

User137
05-06-2009, 06:31 PM
700 polygons per object is maybe too much for big amounts on screen. It is also possible to make extra-low poly models or like mentioned before, simple sprites for objects in distance.

How are you going to handle 10000 ships combat in multiplayer? In event like that rendering is only like 10-30% of processing power :) (Curious because i'm making very similar game myself)

How many players should be able to attend same time with their armies?

User137
05-06-2009, 06:44 PM
I prefer using lists over arrays, especially when it will constantly be updated (adding and deleting), just makes things a little more simpler. Although if using TList is slower in Lazarus than arrays, i will need to look more into dynamic arrays.

Imagine i would have a list of 10000 particles and particle #500 goes old for kill. What happens in TList i don't know but common behavior includes moving all 9499 list items 1 step backwards. Now, imagine these particles would be constantly spawned and killed, it would put whole cpu power in that loop, being very very slow.

What i like is control in my own hands. You only need to move index #(N-1) in position of removed item. (It may be possible to do this manually with TList too but then wouldn't it be just the same using dynamic array?)

Dynamic arrays tend to be slow process to increase/reduce capacity so should only be done in bigger packs, like every 1000 units or more.

Orgun109uk
05-06-2009, 09:24 PM
Hi User137,



700 polygons per object is maybe too much for big amounts on screen. It is also possible to make extra-low poly models or like mentioned before, simple sprites for objects in distance.

The 700 polygons was just for the model i was testing with. The models im planning of using vary on what they are. The structures shouldn't be that big, maybe 50-100 max. The units however are different, some are around 500 and go up to around 3000.

I have swapped my original test model (700 polygons) for another one (3076 polygons) and there is not a huge difference in the FPS while rendering 10,000.

Im only loading the model once, and storing it in the GFX cards memory, then im using the mesh instance class to render it with glDrawElements.



How are you going to handle 10000 ships combat in multiplayer? In event like that rendering is only like 10-30% of processing power :)


Im limiting down the amount of calculations that need to be done; for example, i have it so that i only calculate the bounding box once (when the model is loaded) and then any updates on the position are done using pointers.

The main problem i can see will be the network updates, which im looking into using threads for.

On a side note, im also looking into the possibility of placing the majority of the math functions into separate threads.



(Curious because i'm making very similar game myself)

Cool, hows yours going?



How many players should be able to attend same time with their armies?

This will all depend on many things, im hoping for at least 30. But we shall see when i get to that stage.



Imagine i would have a list of 10000 particles and particle #500 goes old for kill. What happens in TList i don't know but common behavior includes moving all 9499 list items 1 step backwards. Now, imagine these particles would be constantly spawned and killed, it would put whole cpu power in that loop, being very very slow.

What i like is control in my own hands. You only need to move index #(N-1) in position of removed item. (It may be possible to do this manually with TList too but then wouldn't it be just the same using dynamic array?)

Im doing something similar when an object is destroy. When an object is to be removed, it is done like:


value := List[500];
List[500] := List[list.Count - 1];
List[list.Count - 1] := Nil;
FreeAndNil(value);

User137
05-06-2009, 11:56 PM
Cool, hows yours going?
It's this project: :)
http://www.pascalgamedevelopment.com/forum/index.php?topic=5597.0
But slowly now sry... been playing afwul lots of World of Warcraft.


Im only loading the model once, and storing it in the GFX cards memory, then im using the mesh instance class to render it with glDrawElements.
I rendered most objects in displaylist aswell using glDrawElements but without vbo's.



To explain the project a bit better, its kinda like an Online RTS (Online Command and Conquer), you can build your own army to "take over the galaxy" or join a clan and create an even bigger army (which is why i am aiming for such i high number of objects).
I guess projects end up in their technical limits eventually but this sounds like it'd need a supercomputer to run.. i mean, galaxy even if it's not made into scale of Spore but Master of Orion like, tens of stars with 0-5 planets means quite alot units to hold on for server. Sending tight battles across planets to even 8 players same time would be comparable to a upload and download of a full length movie.

There is theory i've tried to explain in my project thread but to me it seems like the only way to handle the masses in network.

And to not offtopic too much in "graphics related post", you may want to create another topic about the project too :)

Mirage
06-06-2009, 10:46 AM
Octree is good for static objects. For dynamic a spatial grid is better.
With such a big number of objects visible it's may be a good idea to use imposters.

P.S. It's good that OpenGL allows to make 5K render calls per frame without slowdown. DirectX doesn't.

Orgun109uk
07-06-2009, 10:42 PM
But slowly now sry... been playing afwul lots of World of Warcraft.

Ahh yes, the WoW bug :D.



I guess projects end up in their technical limits eventually but this sounds like it'd need a supercomputer to run.. i mean, galaxy even if it's not made into scale of Spore but Master of Orion like, tens of stars with 0-5 planets means quite alot units to hold on for server. Sending tight battles across planets to even 8 players same time would be comparable to a upload and download of a full length movie.

There is theory i've tried to explain in my project thread but to me it seems like the only way to handle the masses in network.

Yeah i have thought about this a lot, and have a few ideas i'm going to try out. But we shall see when i get to that point.



And to not offtopic too much in "graphics related post", you may want to create another topic about the project too :)

Yep, i shall do very soon.
Not that i have managed to speed things up, thanks to everyone, i will have some screenshots, then i will post it (it doesn't look like much at the moment).



Octree is good for static objects. For dynamic a spatial grid is better.
With such a big number of objects visible it's may be a good idea to use imposters.

Thanks for the input, i shall look more into those options.

Orgun109uk
09-06-2009, 05:12 PM
Just to keep an update, i have managed to speed up the TList overhead a little by directly accessing the dynamic array (TList.List) and by using the TList.Sort to sort objects by the frustum and the distance from the camera.

Chesso
08-07-2009, 08:43 AM
I'm a little rusty, but to look at this from a general perspective.

If you are creating and killing stuff like there is no tomorrow, from the perspective of a list or array of some sort, why not just have a static array that exceeds what you need and zero out one when it becomes available and use it when you need it.

I know it requires some more processing on the side to deal with this, but surely it beats resizing the poor thing a billion times per second.