OpenGL GLSL - Text rendering query

**AthenaOfDelphi** · 21-01-2018, 03:44 PM

Well whilst we're talking about OpenGL.... would someone like to suggest a reason why glGetUniformLocation doesn't appear to be initialised in dglOpenGL. As best as I can tell, when it's loading the address of the routine, it always gets NIL. I've tried using normal means of starting up (i.e. not specifying a library) and I've also tried to force NVOGL32.DLL. In both cases, it doesn't appear to know about glGetUniformLocation. Either that, or it is somehow getting initialised properly and it can't find the uniform, which is a 2D sampler.

Any thoughts?

**Chebmaster** · 21-01-2018, 04:22 PM

have you played Starcraft 1 recently?

I have not, but Open Arena takes up to 60% of my gaming time.
It runs perfectly.

NVIDIA values OpenGL API backward compatibility

That they do, and I believe glBegin is there to stay forever, BUT with a nasty catch: these technologies are only kept good enough to keep old games running. There is no need to make them efficient.

It's a web site that hosts a PowerPoint presentation

Oh!
So it was working with only originating domain scripts enabled! I just was expecting a video.

On most desktop hardware we've done testing happens actually the opposite - glBegin/glEnd is close to performance to glDrawArrays

Were you measuring overall frame time or just the time of calling the functions?

You see, I found the experimental way that driver sort of stores commands you issue somewhere inside itself and the real work begins (usually) only after you call SwapBuffers (e.g. wglSwapBuffers, glxSwapBuffers, eglSwapBuffers). So to see what is really going on you have to measure how long the SwapBuffers call takes, with vSync disabled, of course.

Preferably, with Aero desktop composition disabled as any semi-transparent effect overlapping your window usually adds extra 8..9ms

I found that *vast majority* of my thread's time is usually spent inside that call, exceptions being FBO creation and GLSL program linking.
And it is where the cost of glBegin is paid.
This was true for all platforms I tried, including various Windows, Linux and wine.

My game engine has a built-in profiler and displays thread time charts along the fps counter. I watch them like a hawk and it draws SwapBuffers time in bright red.

**Chebmaster** · 21-01-2018, 04:33 PM

, or it is somehow getting initialised properly and it can't find the uniform, which is a 2D sampler.

1. It was so NICE of them to declare GLcharARB = Char; PGLcharARB = ^GLcharARB; when Char could be UnicodeChar and OpenGL *only* understands 8-bit strings.

2. It's capricious. Try

Code:

glGetUniformLocation(<program>,  PAnsiChar(RawByteString('my_uniform_name'#0)));

3. It can return -1 if your uniform was not used and the GLSL compiler eliminated it.

P.S. I always play it overly safe using wrappers like this one:

Code:

class function TGAPI.SafeGetUniformLocation(prog: GLuint; name: RawByteString): GLint;
var
  error: UnicodeString;
begin
  try
    case  Mother^.GAPI.Mode of
     {$ifndef glesonly}
      gapi_GL21,
     {$endif glesonly}
      gapi_GLES2: begin
        Result:= glGetUniformLocation(prog, PAnsiChar(name + #0));
        CheckGLError(true);
      end;
    else
      DieUnsupportedGLMode;
    end;
  except
    Die(RuEn(
      'Не удалось получить расположение постоянной %1 для программы %0',
      'Failed to get uniform %1 location for program %0'
      ), [prog, name]);
  end;
end;

where

Code:

procedure CheckGlError(DieAnyway: boolean);
var ec: GLenum;
begin
  {$if defined(cpuarm) and not defined(debug)}
    //Raspberry Pi
    //some shit of a driver spams console with "glGetError 0x500"
    // thus bringing FPS to its knees
    if not DieAnyway then Exit;
  {$endif}

  ec:= glGetError();

  if ec <> 0 then
    if DieAnyway or Mother^.Debug.StopOnOpenGlErrors
      then Die(RuEn(
          'Ошибка OpenGL, %0',
          'OpenGL error, %0'
        ),
        [GlErrorCodeToString(ec)])
      else
        if Mother^.Debug.Verbose then AddLog(RuEn(
            'Ошибка OpenGL, %0',
            'OpenGL error, %0'
          ),
          [GlErrorCodeToString(ec)]);
end;

-- flying over paranoiac's nest

LP · 21-01-2018, 05:33 PM

Originally Posted by Chebmaster

I have not, but Open Arena takes up to 60% of my gaming time.
That they do, and I believe glBegin is there to stay forever, BUT with a nasty catch: these technologies are only kept good enough to keep old games running. There is no need to make them efficient.

I would advocate it slightly different: yes, they (Nvidia, AMD, etc.) need to keep old API interface and for that, they are likely providing a fixed-function pipeline wrapper on top of programmable pipeline. However, since they know their own architecture very well, it is very likely they are using most optimal approach to implement such wrapper. In contrast, when you jump directly to programmable pipeline and try to build an engine of your own, it is much more difficult for you to achieve at least *the same* level of performance as the legacy fixed-function pipeline built as wrapper, because you have to optimize it to many architectures, vendors, drivers, OS versions, etc.

Granted, if you use programmable pipeline properly, you can do much more than you could do with FFP: in a new framework that we're in process of publishing, you can use Compute shaders to produce a 3D volume surface, which is then processed by Geometry/Tessellation shaders - the whole beautiful 3D scene is built and rendered without sending a single vertex to GPU! Nevertheless, it doesn't mean you can't start learning with FFP, it is just as good as any other solution when what you need is to render something simple in 2D or 3D using GPU.

Besides, you never know, maybe some day the whole OpenGL will be made as a wrapper on top of Vulkan API, who knows...

Originally Posted by Chebmaster

Were you measuring overall frame time or just the time of calling the functions?

Either use external profiling tools (Visual Studio has GPU profiler, also Nvidia/GPU also provide their own tools) or at least measure the average frame latency (not frame rate).

Originally Posted by Chebmaster

You see, I found the experimental way that driver sort of stores commands you issue somewhere inside itself and the real work begins (usually) only after you call SwapBuffers (e.g. wglSwapBuffers, glxSwapBuffers, eglSwapBuffers). So to see what is really going on you have to measure how long the SwapBuffers call takes, with vSync disabled, of course.

This is driver-dependent, so they are free to choose whatever approach is more efficient, but commonly you may expect that the work begins (GPU works in parallel) immediately when you issue a draw call. "Swap Buffers" has to wait for all GPU work to finish before swapping surfaces, which is why you feel it taking most of the time.

**Chebmaster** · 22-01-2018, 07:56 AM

maybe some day the whole OpenGL will be made as a wrapper on top of Vulkan API,

I am already using Google's GL ES over Direct 3d wrapper

it is much more difficult for you to achieve at least *the same* level of performance as the legacy fixed-function pipeline

And sometimes flat out impossible.
I have some museum pieces of the GeForce line (7025, I think?) where even barebones GLSL is noticeably slower than FFP. That's the main one reason I haven't abandoned FFP completely: a situation may arise where my engine renders the scene in really low resolution, then stretches the result using FFP. I consider that better alternative to not running.

Either use external profiling tools [...] or at least measure the average frame latency

Ok, I really need to get to that benchmarking. Now curiosity is gnawing at me.

(not frame rate)

Yesss, measuring FPS is the primary noob marker.

"Swap Buffers" has to wait for all GPU work to finish before swapping surfaces, which is why you feel it taking most of the time.

That, yes, but this shows that function calls themselves are (more often that not) negligible in the grand scheme of things.

I suggest we return to this when I have my tool set working again.

**phibermon** · 23-01-2018, 05:26 PM

glFlush and glFinish are the correct calls to make to handle the GL command buffer. The swap operation essentially calls these, schedules the swap on v-sync and then blocks until that's complete.

GL commands might be buffered across a network, or to a display server (think remote GL on X11) or in the GL implementation you're using (mesa, nvidia drivers etc)

To correctly handle a v-synced GL application you should be setting up your frame so you spend the minimum amount of time possible on the blocking swap call.

You can use GL timers to get the 'server' side timing, calculate the command buffer overhead by comparing that against your 'client' side GL calls followed by flush and finish and then you know that you can call swap and meet the v-sync 'deadline' with the minimum amount of time blocked in the swap call.

If you stream resources to graphics memory? such a setup is crucial for getting the maximum bandwidth without dropping frames.

As far as legacy GL is concerned, or to give it the correct name : "immediate mode"? the clue is in the name. GL3+ (in the context of performance) is all about batching - minimizing the API calls and efficiently aligning data boundaries.

It's not about fixed function or programmable hardware - it's about how the data is batched and sent to that hardware, regardless of the actual functionality.

Display lists in older GL versions were a herald of things to come - they allow you to bunch together all your commands and allow the GL implementation to store the entire display list in whatever native format it pipes to the actual hardware, effectively storing the commands 'server' side so they don't have to be sent over the pipe every time. How they are handled is vendor specific, some display list implementations are no faster than immediate calls (some old ATI + intel drivers long ago were known for it)

So yes - IMMEDIATE mode GL, that is, versions before 3.x which are referred to here as 'legacy' - will always be slower than the modern APIs - it's not about vendor implementation regardless of how optimal or not their code is.

Other than display lists and any driver specific optimizations the vendor may of made - there's no 'batching' in legacy GL. This is why we have VBOs, instancing - all server side optimizations - this is why Vulkan and newer APIs are capable of better performance in various situations.

The bottleneck is call overhead, 'instruction latency and bandwidth' - call it what you will. It's not what you move or what is done with it when it gets there - it's how you move it.

GL2 is moving a ton of dirt with a teaspoon. GL3+ is moving a ton of dirt with a pneumatic digger. (Vulkan is moving a ton of dirt with a bespoke digger that has been built right next to the pile of dirt)

If you're only moving a teaspoon of dirt? use a teaspoon, it's easier to use than a digger - but don't defend your teaspoon when you need to move a ton of dirt.

Or something like that. Something something teaspoon. I think we can all agree : teaspoon.

**Chebmaster** · 24-01-2018, 12:37 PM

If you're only moving a teaspoon of dirt? use a teaspoon [...] Or something like that. Something something teaspoon.

Much better said that me. Thank you.
I'll add from myself that there is also GLES (an unruly, hard to tame beast) that does not have immediate mode functions but is required to support some platforms.

Me, I created a wrapper classfor small to medium loads. It hides actual implementation (may be glVertexAttribPointer() or may be older, more basic vertex arrays) and only use it to move my teaspoons.
Why? Because this way it lets you think of *all* your data as meshes. And contains much, much less API-related code to rehaul if you have to change your rendering mechanisms later.

glFinish()

I observed modern intel drivers flat out ignore this. But legacy drivers (say, Windows XP + Gf 5200 FX) *require* this called *after* SwapBuffers, otherwise you won'y get accurate measurement
(my engine uses this value for controlling adaptive detail, alongside the fps meter)

so you spend the minimum amount of time possible on the blocking swap call.

So, rendering, *then* sh-- stuff like backround uploading of higher LODs and only then SwapBuffers...? That's... such a good idea. Why am I doing it backwards?

You can use GL timers to get the 'server' side timing,

Me, I usually get "yer ghetto intel video doesn't know that trick" and stop bothering

**paul_nicholls** · 24-01-2018, 09:20 PM

Originally Posted by Chebmaster

Me, I usually get "yer ghetto intel video doesn't know that trick" and stop bothering

hahah! That's gold

**laggyluk** · 26-01-2018, 08:38 AM

On the topic, using shaders: if you are drawing text on background you don't need to use transparency, just lerp between font and background color based on font alpha.

I don't know anything about OpenGL 2.0 because I stared with 3.0 using tutorials in C++. You don't really need to know much about C++ to apply this to pascal as OpenGL API calls stay the same. I guess that's why no one really bothers to make tutorials in Pascal

Learned mainly from this one with zero prior OpenGL knowledge and managed to pull off deferred renderer with SSAO and stuff: http://ogldev.atspace.co.uk/

**AthenaOfDelphi** · 26-01-2018, 04:09 PM

Originally Posted by laggyluk

On the topic, using shaders: if you are drawing text on background you don't need to use transparency, just lerp between font and background color based on font alpha.

I don't know anything about OpenGL 2.0 because I stared with 3.0 using tutorials in C++. You don't really need to know much about C++ to apply this to pascal as OpenGL API calls stay the same. I guess that's why no one really bothers to make tutorials in Pascal

Learned mainly from this one with zero prior OpenGL knowledge and managed to pull off deferred renderer with SSAO and stuff: http://ogldev.atspace.co.uk/

Thanks for that, I'll take a look. For the time being, my approach is working, allowing me to focus on the more interesting elements of the game I'm trying to write

Moderation Process Reminder

Thread: OpenGL GLSL - Text rendering query

Thread Tools

Display

Bookmarks

Bookmarks

Posting Permissions