PDA

View Full Version : Jengine - OpenGL 3.3/4.x engine & JUI progress



phibermon
13-06-2011, 12:43 PM
Hi All,

I've been out of the general loop for quite some time due to personal reasons but I thought I'd give you a little snapshot of the framework/engine (call it what you will) that I've been working on.

"Jengine" (oh how creative) is a fully OpenGL 3.3/4.x 'core' targeted code base that is basically a mash-up of concepts I like from various engines / 3D applications.

So there's no immediate mode people; say goodbye to GLBegin, it's the end for GLEnd, no more ancient fixed function pipeline.

Everything from the UI (JUI for those of you who remember) down to the 3D rendering code is all shader based, utilizing the core profiles of GL3.3/4.x only.

There's no GL2.x/ES code paths for simple reasons : I am but one man. I don't have the time to back-port techniques in the time-frame I'm working to. Plus many techniques including various culling operations are all GPU bound and would require pretty much implementing from scratch all the old CPU methods for such things and finally, I got a new capable graphics card and I do this for my own enjoyment :)

Ok, here's a screenie of the basic interface, with Terrain, Sky and Camera nodes slapped into the scene-graph :

427

The 'Sky' node is capable of various techniques including sky-boxes, domes as well as approximate atmospheric scattering for planetary atmospheres. (still deciding on features here, not sure if I want the engine to render from outer-space so this may be replaced entirely)

The 'Terrain' node is still under development but here you can see a displacement map loaded (height-field) along with textures. Will also supports normal-maps and light-maps in release, but still deciding upon lighting systems to use for the terrain. Anyway, it can page in tiled terrain of theoretically infinite size, what you see here is actually a 32768 x 32768 tiled terrain made up of a single channel 32bit floating point displacment map (for 'Y' resolution) along with a lower resolution tiled diffuse map.

Oh and it uses the latest techniques for tesselation added in GL4.0. Based upon various Nvidia white-papers on the subject, here's a better view of it in operation :

428

In the case of GL 3.3, "Jengine" currently just uses a lower resolution mesh than is produced by the tessellation which matches the GL 4.0 performance on my system.

If I end up implementing a classic LOD scheme then I'm not sure what to go for yet.

I'd make the engine exclusively GL4.x but it's pretty much only the tesselation I'm using so far and it seems a shame to cut out so many 3.3 cards just for that.

Anyway, there's tons of other stuff that I won't go into detail about just yet that include : MRT deferred render, full GPU skinning (skeletal animation), Variance Shadow Maps, Physics intergration (currently using newton) as well as all the sandwich fillings like a streaming virtual file system, networking (*anything* in the engine can be serialized) font sub-system, compositing manager for future effects, OpenAL and Path-finding / Steering algorithms (I've ported+improved open-steer) to mention a few.

---

JUI (the windowing interface) has come along in leaps and bounds. Everything or nothing about it can be threaded, it has loads more widgets and also makes use of an FBO rendering mode that in turn allows for full compositing aka compiz on posix systems.

Utilizing FBO's and minimizing re-draws solves many problems, but that's another story :)

Hope you're all well!

Johnny

chronozphere
13-06-2011, 12:57 PM
Those are very exciting developments. Since I have not much time to develop my engine, I might aswell give this one a try when it's more or less finished. ;D

Is it easy to set up and tear down the framework? Would like to see some code examples that show how everything fits together. :)

phibermon
13-06-2011, 01:12 PM
Yes, quite a lot of the engine can be used stand-alone with minimum dependencies, here's some various snippets so you can see the coding style :

Window Initialization

WindowMan := TJWindowMan.Create(JDefaultWindowMan);
WindowStartupProperties := JDefaultWindow;
WindowStartupProperties.FullScreen := false;
WindowStartupProperties.GLMajorVersion := 4;
WindowStartupProperties.GLMinorVersion := 1;
WindowStartupProperties.GLCoreProfile := true;
WindowStartupProperties.GLForwardCompatible := false;

WindowStartupProperties.OnCreate := @OnWindowCreate;
WindowStartupProperties.OnDestroy := @OnWindowDestroy;
WindowStartupProperties.OnExecute := @OnWindowExecute;
WindowStartupProperties.Threaded := false;
JWindow := Windowman.Createwindow(WindowStartupProperties);

Scene & JUI Initialization

Scene := TJScene.create(JWindow.ImageMan,JWindow.ShaderMan, JWindow.ModelMan);
SceneView := TJSceneView.create(JWindow, Scene);

JUIWindowMan := TJUIWindowMan.Create(JWindow);

JUIWindowMan.Theme := TJUITheme.create;
JUIGlobal.WindowMan := JUIWindowMan;

JUIWindowMan.AttachJViewPort(SceneView);
JAPTaskbar := TJAPTaskbar.Create(JUIWindowMan);
JAPSceneTree := TJAPSceneTree.Create(JUIWindowMan, Scene);

Example Render Loop

FCamera.Look;
{Set Shader Uniform buffers}
FJWindow.ShaderMan.ModelViewMatrix := FCamera.ModelViewMatrix;
FJWindow.ShaderMan.ProjectionMatrix := FCamera.ProjectionMatrix;
FJWindow.PrimativeRenderer.Grid(vec3(-128,0,-128));
FScene.DoRender;


-----

You may of noticed that JWindow contains the image, model and shader managers.

While this might seem like an odd design decision it's simple enough : JWindowMan supports multiple windows across multiple monitors that the user might want to have their own resources/contexts. So everything that is specific to the context lives in JWindow. Shared contexts across multiple windows are handled by JWindowMan and you simply tell it which window you want to share with.

Traveler
13-06-2011, 01:21 PM
Looks pretty great! Too bad you're aiming for top of the line gfx cards only. I understand your reasoning, but you're targeting a small audience if you're planning to make a game with this engine eventually.

phibermon
13-06-2011, 01:34 PM
You are quite right, my reasoning doesn't make for a sound design decision in the present day but :

there will come a day when it's no longer a problem

I want to make an MMORPG so by the time I've finished? all this may very well be old news

I secretly harbor the desire to get lots of developers on board and perhaps open up GL2.x/ES code paths (ok, so not so secret now)

WILL
13-06-2011, 05:27 PM
I wonder how compatible this might be on say iOS? Not to sound redundant, but it's the way a lot of developers are going these days.

Traveler
13-06-2011, 06:49 PM
I believe iOS still uses GL2.x/ES

phibermon
13-06-2011, 07:50 PM
I'm afraid that IOS doesn't interest me enough at the moment. There can be no doubt that IOS with it's staggering user-base is attracting a large number of developers but that's not where the cutting edge of graphics are and that's where I would like to be personally.

If anything IOS is a huge step backwards for developers in terms of GPU technology.

While the mobile platforms have performance characteristics that are very impressive for their size and power footprint, compared to the high end PC spectrum and indeed the soon to be released, next generation consoles?

The IPad 2 for instance, with it's out-of-order, dual core 1gz A9 Cortex chip and PowerVR GPU would barely hold it's own against average gaming PC's of 8 years ago (especially given that the small cache that normally accompanies the A9 implementations; 'ghz to ghz' comparisons against desktop CPUs of similar frequencies are woefully ill-founded)

I feel there's work to be done on the Object Pascal front, targetting the newer versions of OpenGL.

And assuming I continue to work alone? I should be finished right about the time that mobile devices see the next generation of OpenGL ES ;)

My reasoning would be this : major companies have bought into ES fully; Apple, Samsung, Sony etc. They're all now bound to OpenGL (somthing that I bet microsoft is regretting given they've had plenty of opportunity to cross platform DirectX ;) )

When the next itteration of ES comes, that'll be what these companies use and it will almost certainly be a sub-set of GL4.0 or some future version. Khronos will not further diverge ES away from the mainline GL versions, if anything they'll aim to merge again.

But I digress.

Really it's because I want to, I'm not interested in 'following the money' or even targeting the largest audience. If that was my goal I'd be coding in C/C++ :)

code_glitch
13-06-2011, 09:08 PM
Ah but on iPad 2 its supposed to be '9x faster' than 'iPad 1' which is 'faster' than iPhone which is 'faster' than... well you get the idea. :D Overall, looking pretty good. Looks like some very nice performance there (455fps). What card is this running on? My HD4330 only has OpenGl 3.2 so I'm at a loss there, and most GMA cards come with 2.1 or somewhere in that ballpark. And the newest sandy bridge 'cards' shipping on o7s etc only sport OpenGl 3.1/3.2. Unfortunately, i7 is 'the future' according to intel, so yes I agree it will not be an issue with gamers that have a GTX460 (or even casual people like me that buy a mid range discrete card, no debates please, I get >30fps out of almost everything on med settings, will just be on that bandwagon). Just don't make the game too popular k? ;) lol. You might end up getting intel to make real graphic chips one day.

Anyways, I get what you're saying in a sense: if no one uses the latest technology, then why is it there in the first place and why make new things right? Shame no one likes the word 'new'. In programming new = crash, bugs, trouble and more trouble so I can understand some of the reasons. But OGL 1.5 on windows 7 platforms? come on...

Carver413
14-06-2011, 12:28 PM
I have to agree with phibermon, it is not very wise to build a new engine on old code.

phibermon
14-06-2011, 02:03 PM
Hmm, well giving it thought the one feature I just can't justify loosing are Uniform Buffer Objects (change once, applies to all shaders that use it opposed to setting uniforms for each single shader. Think of them like a single instance of a customizable record that you can share across multiple shaders) which were introduced in GL3.1, so I'll do some damage control and see how much work it would represent to make 3.1 the lower dependancy. You made an excellent point about sandy bridge : I was not aware that the on-chip graphics were 3.1/3.2, I assumed 2.1. For that reason I shall have to look into it, I can sit pretty knowing that cheap 4.x cards will soon dominate but that on-die intel monstrosity is going to be the only solution a lot of laptop users will have for the next few years.

Carver : I wouldn't like to offend those pursuing ES as their route to GL3/4 will be a lot easier than those coding in immediate mode 2.x, but yes I'd agree with that statement. It's not just the performance gains but it's the usability too.

My terrain engine was nearly effortless with GL4.0.

LOD, low level culling etc are all done on the GPU and as a result can sit exactly where they need to for the simplest approach. Older CLOD systems (Roam etc) are far more complex, doing all they can to minimize the bottleneck of constantly transfering vertices to the card from the system. That's just not an issue with tesselation; you just send a sparse patch mesh and tesselation+displacment does the rest with, more or less, free seamless welding of patch edges.

code_glitch
14-06-2011, 03:03 PM
Or you could have abump at intels market share with another strategy: do it all in 4.x and make a 'crapo' mode where it has a very basic, quickly implemented set of shaders for 2.x/3.x and make a really good game - that way either intel gets some serious opnegl umpf, or ati/nvidia get some market boosts. Either way everybody wins :D

But yes, I was dissapointed when sandy bridge (the creme de la creme) from intel came out with 3.1/3.2 support and ATI/NVidia cards had that since... well, the dawn of time. OK, not really, but a while now.

Mind you, sandy bridge is the only GMA chip that can render something fast enough for it to even be visible to humans. (sorry gma fans - whoever you may be)

Anyway, good luck and those features do indeed sound tempting.

phibermon
14-06-2011, 04:41 PM
hehe :) you might have somthing there. I've been looking at various CLOD techniques that could be used for <GL4.0. The only ones I'd be happy with from a technical stand-point are either a GPU optimized Geo-Clipmapping :

http://research.microsoft.com/~hoppe/gpugcm.pdf

or this :

http://vertexasylum.com/2010/07/11/oh-no-another-terrain-rendering-paper/

The latter looks suprisingly similar in wireframe mode as my GL 4 technique and by my rough estimates is not that far off the FPS. However, it requires extensive pre-processing of the terrain dataset and does *far* more work on the CPU (which is not quite fair, the techniques I've employed (very nearly) don't use the CPU at all)

And to top it all off, it's complicated to implement although a port of the source provided would be possible given enough Direct3D research.

So to support older cards for the terrain, I'll simply brute force render (with a bit of culling) and drop both the poly count and the draw distance (like I'm doing) until it matches the FPS.

If I did implement an alternate CLOD for older cards, I would most likely choose geo-clipmapping as I can use the same dataset as I use now without the pre-processing of the preffered technique. (it really is very impressive though, check it out if you have the time)

code_glitch
14-06-2011, 06:01 PM
Hmm... Although I am totally for GPGPUs and the GPU over the CPU (just look at those giga/teraflops of an advantage) it raises one problem I've had to contend with a few times, and many gamers too: a top spec cpu is not a bottleneck, you can still make a decent gaming rig out of C2Ds since its all on the GPU. That way you save £100 or so on your cpu, just make sure you buy the extra HD5990 to make up for it ;)

It come down to the old cpu time vs ram again doesn't it? You can put up a loading screen, compute most it all before hand and store it in ram at the expense of many MBs (not too much of an issue for a lot of people that now have >2GB). Or you can compute it on the fly and save RAM (back in the XP days of 512MBs). On low GPU machines its a case of: RAM & wait first, or CPU and wait a little bit the whole time. Heck, I'm running into this with my libs now: indexing lists for example. just how much do you index? what is a good CPU:RAM usage ratio. Why couldn't it ever be simple right?

phibermon
15-06-2011, 10:05 AM
Again good point. I suppose all these differrent techniques are only really making different compromises between CPU/GPU performance and available memory/bandwidth.

In my case I'm lucky enough to own an I7 950 which is suprisingly fast (after moving from my previous fastest, an Atom 330). The GPU I have is an Nvidia GTX 460 which is towards the lower-end of the budget spectrum and should represent the average card within say, a year (in terms of performance, functionality)

My general goal was to implement GPU heavy techniques until a bunch of average entries in the scene graph brings the framerate below 200 FPS.

Then I'll focus on getting the most CPU heavy techniques isolated in seperate threads, primarily that'll be physics, path-finding and steering.

I've not yet examined OpenCL/CUDA and probably won't. There's enough C syntax in the shaders without polluting the rest of the engine/framework. And again as in you stated, OpenCL/CUDA is yet another comprimise, taking compute power away from shaders. While this is an excellent choice in dual GPU setups, Mafia 2 for instance is not as fast as I'd hoped when PhysX is turned on to max (PhysX is just a CUDA program underneath). But GTA 4, while not quite the same the level of physical interactions; is silky smooth with it's CPU side physics and arguably higher poly counts and that's with the incredible euphoria engine doing all it's funky inverse kinematic kind of things as well.

So you've hit the nail on the head there : A good engine is a good balance between various bottlenecks on average systems. A great engine can utilize different techniques to balance the bottlenecks on a wide range of hardware (and I think we've seen a big shift in recent years from CPU to GPU side bottlenecks in terms of what is demanded of a game)

code_glitch
15-06-2011, 03:16 PM
Hem... I believe intel put the price according to the
I7 950 which is suprisingly fast part... Ok, so its on of THE fastest processors around (I can't afford one so the gaming rig build will most likely be a phenom 2 955 BE OCd) and the GTX460 might be cheap(er) than launch, but let me say this: it still trumps a good portion of the market - and I would be worried if thats the market in a year :) my 4330 really would be up the wall that time...

Now OpenCl is interesting in itself - the way I see it its' a case of GPGU, do we want that? Anyone? Ah yes, you sir mr higher end ATI and Nvidia. Any others? No? Oh well, a portion of th market can enjoy. Although I'm totally for making use of the teraflops of GPU power many people now have or at the very least hundreds of megaflops - unless it fits into every language, is nicer than the first iterations of OpenGl and works on everything (eg. no nvidia bugs in early iterations with EXTs and FBOs) then fine. But would you like to learn a whole new ideology for 20% of the market to get better performance than the best available (which they are already getting) and write it all again normally? Probably not.

Besides that: since when has CPU power been limiting in games? Since many apt gaming rigs still sport C2Ds and we talk more about GPUs for games than CPUs - GPGPUs are a problem: take load where there is not enough of it and move it to an area that is already under too much load? Logical isn't it. How come ATI cards do good gaming without nvidia physx? - the same reason you mentioned: GPU bottlenecks.

The approach you're taking is nice indeed, though, one thing at a time and all and I'll definitely be taking a look into the code - perhaps run it even (once I get a system that can :p). Have to say, though, the i7 is futureproofing gone a long way for CPU power. What I would like to see though is a different matter: OpenGCL - Open generalised computing language. AKA. A language/set of headers that pools together ALL resources of a system (audio processor, GPU and CPU) and run everyhting on whatever is best suited/has the most available computing power. That way its a case of: your system has one speed rating: generalised power. No RAM, CPU, GPU etc. Just one figure, one language, and one bottleneck: overall power. Now that would really make my day.