OpenGL GLSL - Text rendering query

**Chebmaster** · 21-01-2018, 01:48 PM

And this is the output....

I'm glad you got it working!
Try experimenting with glDisable(GL_BLEND) and see if it still works (unless your intention was blending the background with what was already rendered there).
I think my advice earlier was wrong and you only need the test, without blending.

you didn't see the link I've posted

Sorry. You posted a link to a page with one paragraph of title text and some embedded rectangle that doesn't work without JavaScript. Okay, I temporarily enabled JS for slideshare.net and slidesharecdn.com. And what I see?

A veritable wall of sites that suddenly want to run their JavaScript in my browser.
How about NO?
I alvays ignore such dumps of websites full of suspicious third party scripts.

Added this to my hosts file:

Code:

127.0.0.1 www.slideshare.net
127.0.0.1 slideshare.net
127.0.0.1 slidesharecdn.com
127.0.0.1 public.slidesharecdn.com

so that that dump is always blocked.

Besides, judging by the title that was something a nVidia dev said 9 years ago. Lots happened since then and I am more interested in the situation of today.

On most desktop hardware we've done testing happens actually the opposite - glBegin/glEnd is close to performance to glDrawArrays

That's a very interesting result and I can't wait until I'm done with my rehaul to do some more benchmarking.
There must be some interesting dependency on driver version, operating system, the way you initialize GL or even if your app is 32 or 64-bit.
Because I got what I got, both on Win7/HD3000/i5 and Win10/GTX460/Phenom II.

LP · 21-01-2018, 03:16 PM

Originally Posted by Chebmaster

Sorry. You posted a link to a page with one paragraph of title text and some embedded rectangle that doesn't work without JavaScript. Okay, I temporarily enabled JS for slideshare.net and slidesharecdn.com. And what I see?
A veritable wall of sites that suddenly want to run their JavaScript in my browser.
How about NO?
I alvays ignore such dumps of websites full of suspicious third party scripts.

It's a web site that hosts a PowerPoint presentation made by aforementioned Nvidia person. I also use a JavaScript blocker in FireFox myself, but for occasional viewing open links in Chromium (Edge on Windows), which is something you can do for such exceptional cases. It's too bad you didn't care to check the presentation, especially for yourself as your attitude was a classical example of Dunning-Krueger effect: you have barely learned shader basics (as you've said yourself), so being a beginner you try to give advice whether to use a particular technology or not, while not being an expert on this topic; and you fear to see a presentation without enabling JavaScript, so can't actually learn something new. Please don't do that, such attitude hinders the real talent you may have.

Here's a quote from that Nvidia presentation:

NVIDIA values OpenGL API backward compatibility
- We don't take API functionality away from you
- We aren't going to foce you to re-write apps
Does deprecated functionality "stay fast"?
- Yes, of course - and stays fully tested
- Bottom-line: Old & new features run fast

Originally Posted by Chebmaster

Besides, judging by the title that was something a nVidia dev said 9 years ago. Lots happened since then and I am more interested in the situation of today.

"You have to know the past to understand the present." This Nvidia presentation talks about OpenGL 3.2, which is exact moment the whole deprecation thing happened. I obviously can't speak for graphics vendors, but I believe their commitment to continue supporting all OpenGL features in their graphics drivers is driven both by video games industry (there are a lot of old games that use legacy code; have you played Starcraft 1 recently? It still works and that's DirectDraw with direct primary surface access

) and enterprise sector, where you may find large code bases dating back to 90s.

So it boils down to the same thing people said in Pascal vs C thread: use the tool that you find most appropriate for your current task.

**AthenaOfDelphi** · 21-01-2018, 03:44 PM

Well whilst we're talking about OpenGL.... would someone like to suggest a reason why glGetUniformLocation doesn't appear to be initialised in dglOpenGL. As best as I can tell, when it's loading the address of the routine, it always gets NIL. I've tried using normal means of starting up (i.e. not specifying a library) and I've also tried to force NVOGL32.DLL. In both cases, it doesn't appear to know about glGetUniformLocation. Either that, or it is somehow getting initialised properly and it can't find the uniform, which is a 2D sampler.

Any thoughts?

**Chebmaster** · 21-01-2018, 04:22 PM

have you played Starcraft 1 recently?

I have not, but Open Arena takes up to 60% of my gaming time.
It runs perfectly.

NVIDIA values OpenGL API backward compatibility

That they do, and I believe glBegin is there to stay forever, BUT with a nasty catch: these technologies are only kept good enough to keep old games running. There is no need to make them efficient.

It's a web site that hosts a PowerPoint presentation

Oh!
So it was working with only originating domain scripts enabled! I just was expecting a video.

On most desktop hardware we've done testing happens actually the opposite - glBegin/glEnd is close to performance to glDrawArrays

Were you measuring overall frame time or just the time of calling the functions?

You see, I found the experimental way that driver sort of stores commands you issue somewhere inside itself and the real work begins (usually) only after you call SwapBuffers (e.g. wglSwapBuffers, glxSwapBuffers, eglSwapBuffers). So to see what is really going on you have to measure how long the SwapBuffers call takes, with vSync disabled, of course.

Preferably, with Aero desktop composition disabled as any semi-transparent effect overlapping your window usually adds extra 8..9ms

I found that *vast majority* of my thread's time is usually spent inside that call, exceptions being FBO creation and GLSL program linking.
And it is where the cost of glBegin is paid.
This was true for all platforms I tried, including various Windows, Linux and wine.

My game engine has a built-in profiler and displays thread time charts along the fps counter. I watch them like a hawk and it draws SwapBuffers time in bright red.

**Chebmaster** · 21-01-2018, 04:33 PM

, or it is somehow getting initialised properly and it can't find the uniform, which is a 2D sampler.

1. It was so NICE of them to declare GLcharARB = Char; PGLcharARB = ^GLcharARB; when Char could be UnicodeChar and OpenGL *only* understands 8-bit strings.

2. It's capricious. Try

Code:

glGetUniformLocation(<program>,  PAnsiChar(RawByteString('my_uniform_name'#0)));

3. It can return -1 if your uniform was not used and the GLSL compiler eliminated it.

P.S. I always play it overly safe using wrappers like this one:

Code:

class function TGAPI.SafeGetUniformLocation(prog: GLuint; name: RawByteString): GLint;
var
  error: UnicodeString;
begin
  try
    case  Mother^.GAPI.Mode of
     {$ifndef glesonly}
      gapi_GL21,
     {$endif glesonly}
      gapi_GLES2: begin
        Result:= glGetUniformLocation(prog, PAnsiChar(name + #0));
        CheckGLError(true);
      end;
    else
      DieUnsupportedGLMode;
    end;
  except
    Die(RuEn(
      'Не удалось получить расположение постоянной %1 для программы %0',
      'Failed to get uniform %1 location for program %0'
      ), [prog, name]);
  end;
end;

where

Code:

procedure CheckGlError(DieAnyway: boolean);
var ec: GLenum;
begin
  {$if defined(cpuarm) and not defined(debug)}
    //Raspberry Pi
    //some shit of a driver spams console with "glGetError 0x500"
    // thus bringing FPS to its knees
    if not DieAnyway then Exit;
  {$endif}

  ec:= glGetError();

  if ec <> 0 then
    if DieAnyway or Mother^.Debug.StopOnOpenGlErrors
      then Die(RuEn(
          'Ошибка OpenGL, %0',
          'OpenGL error, %0'
        ),
        [GlErrorCodeToString(ec)])
      else
        if Mother^.Debug.Verbose then AddLog(RuEn(
            'Ошибка OpenGL, %0',
            'OpenGL error, %0'
          ),
          [GlErrorCodeToString(ec)]);
end;

-- flying over paranoiac's nest

LP · 21-01-2018, 05:33 PM

Originally Posted by Chebmaster

I have not, but Open Arena takes up to 60% of my gaming time.
That they do, and I believe glBegin is there to stay forever, BUT with a nasty catch: these technologies are only kept good enough to keep old games running. There is no need to make them efficient.

I would advocate it slightly different: yes, they (Nvidia, AMD, etc.) need to keep old API interface and for that, they are likely providing a fixed-function pipeline wrapper on top of programmable pipeline. However, since they know their own architecture very well, it is very likely they are using most optimal approach to implement such wrapper. In contrast, when you jump directly to programmable pipeline and try to build an engine of your own, it is much more difficult for you to achieve at least *the same* level of performance as the legacy fixed-function pipeline built as wrapper, because you have to optimize it to many architectures, vendors, drivers, OS versions, etc.

Granted, if you use programmable pipeline properly, you can do much more than you could do with FFP: in a new framework that we're in process of publishing, you can use Compute shaders to produce a 3D volume surface, which is then processed by Geometry/Tessellation shaders - the whole beautiful 3D scene is built and rendered without sending a single vertex to GPU! Nevertheless, it doesn't mean you can't start learning with FFP, it is just as good as any other solution when what you need is to render something simple in 2D or 3D using GPU.

Besides, you never know, maybe some day the whole OpenGL will be made as a wrapper on top of Vulkan API, who knows...

Originally Posted by Chebmaster

Were you measuring overall frame time or just the time of calling the functions?

Either use external profiling tools (Visual Studio has GPU profiler, also Nvidia/GPU also provide their own tools) or at least measure the average frame latency (not frame rate).

Originally Posted by Chebmaster

You see, I found the experimental way that driver sort of stores commands you issue somewhere inside itself and the real work begins (usually) only after you call SwapBuffers (e.g. wglSwapBuffers, glxSwapBuffers, eglSwapBuffers). So to see what is really going on you have to measure how long the SwapBuffers call takes, with vSync disabled, of course.

This is driver-dependent, so they are free to choose whatever approach is more efficient, but commonly you may expect that the work begins (GPU works in parallel) immediately when you issue a draw call. "Swap Buffers" has to wait for all GPU work to finish before swapping surfaces, which is why you feel it taking most of the time.

**Chebmaster** · 22-01-2018, 07:56 AM

maybe some day the whole OpenGL will be made as a wrapper on top of Vulkan API,

I am already using Google's GL ES over Direct 3d wrapper

it is much more difficult for you to achieve at least *the same* level of performance as the legacy fixed-function pipeline

And sometimes flat out impossible.
I have some museum pieces of the GeForce line (7025, I think?) where even barebones GLSL is noticeably slower than FFP. That's the main one reason I haven't abandoned FFP completely: a situation may arise where my engine renders the scene in really low resolution, then stretches the result using FFP. I consider that better alternative to not running.

Either use external profiling tools [...] or at least measure the average frame latency

Ok, I really need to get to that benchmarking. Now curiosity is gnawing at me.

(not frame rate)

Yesss, measuring FPS is the primary noob marker.

"Swap Buffers" has to wait for all GPU work to finish before swapping surfaces, which is why you feel it taking most of the time.

That, yes, but this shows that function calls themselves are (more often that not) negligible in the grand scheme of things.

I suggest we return to this when I have my tool set working again.

**phibermon** · 23-01-2018, 05:26 PM

glFlush and glFinish are the correct calls to make to handle the GL command buffer. The swap operation essentially calls these, schedules the swap on v-sync and then blocks until that's complete.

GL commands might be buffered across a network, or to a display server (think remote GL on X11) or in the GL implementation you're using (mesa, nvidia drivers etc)

To correctly handle a v-synced GL application you should be setting up your frame so you spend the minimum amount of time possible on the blocking swap call.

You can use GL timers to get the 'server' side timing, calculate the command buffer overhead by comparing that against your 'client' side GL calls followed by flush and finish and then you know that you can call swap and meet the v-sync 'deadline' with the minimum amount of time blocked in the swap call.

If you stream resources to graphics memory? such a setup is crucial for getting the maximum bandwidth without dropping frames.

As far as legacy GL is concerned, or to give it the correct name : "immediate mode"? the clue is in the name. GL3+ (in the context of performance) is all about batching - minimizing the API calls and efficiently aligning data boundaries.

It's not about fixed function or programmable hardware - it's about how the data is batched and sent to that hardware, regardless of the actual functionality.

Display lists in older GL versions were a herald of things to come - they allow you to bunch together all your commands and allow the GL implementation to store the entire display list in whatever native format it pipes to the actual hardware, effectively storing the commands 'server' side so they don't have to be sent over the pipe every time. How they are handled is vendor specific, some display list implementations are no faster than immediate calls (some old ATI + intel drivers long ago were known for it)

So yes - IMMEDIATE mode GL, that is, versions before 3.x which are referred to here as 'legacy' - will always be slower than the modern APIs - it's not about vendor implementation regardless of how optimal or not their code is.

Other than display lists and any driver specific optimizations the vendor may of made - there's no 'batching' in legacy GL. This is why we have VBOs, instancing - all server side optimizations - this is why Vulkan and newer APIs are capable of better performance in various situations.

The bottleneck is call overhead, 'instruction latency and bandwidth' - call it what you will. It's not what you move or what is done with it when it gets there - it's how you move it.

GL2 is moving a ton of dirt with a teaspoon. GL3+ is moving a ton of dirt with a pneumatic digger. (Vulkan is moving a ton of dirt with a bespoke digger that has been built right next to the pile of dirt)

If you're only moving a teaspoon of dirt? use a teaspoon, it's easier to use than a digger - but don't defend your teaspoon when you need to move a ton of dirt.

Or something like that. Something something teaspoon. I think we can all agree : teaspoon.

**Chebmaster** · 24-01-2018, 12:37 PM

If you're only moving a teaspoon of dirt? use a teaspoon [...] Or something like that. Something something teaspoon.

Much better said that me. Thank you.
I'll add from myself that there is also GLES (an unruly, hard to tame beast) that does not have immediate mode functions but is required to support some platforms.

Me, I created a wrapper classfor small to medium loads. It hides actual implementation (may be glVertexAttribPointer() or may be older, more basic vertex arrays) and only use it to move my teaspoons.
Why? Because this way it lets you think of *all* your data as meshes. And contains much, much less API-related code to rehaul if you have to change your rendering mechanisms later.

glFinish()

I observed modern intel drivers flat out ignore this. But legacy drivers (say, Windows XP + Gf 5200 FX) *require* this called *after* SwapBuffers, otherwise you won'y get accurate measurement
(my engine uses this value for controlling adaptive detail, alongside the fps meter)

so you spend the minimum amount of time possible on the blocking swap call.

So, rendering, *then* sh-- stuff like backround uploading of higher LODs and only then SwapBuffers...? That's... such a good idea. Why am I doing it backwards?

You can use GL timers to get the 'server' side timing,

Me, I usually get "yer ghetto intel video doesn't know that trick" and stop bothering

Thread: OpenGL GLSL - Text rendering query

Thread Tools

Display

Hybrid View

Bookmarks

Bookmarks

Posting Permissions