maybe some day the whole OpenGL will be made as a wrapper on top of Vulkan API,
I am already using Google's GL ES over Direct 3d wrapper

it is much more difficult for you to achieve at least *the same* level of performance as the legacy fixed-function pipeline
And sometimes flat out impossible.
I have some museum pieces of the GeForce line (7025, I think?) where even barebones GLSL is noticeably slower than FFP. That's the main one reason I haven't abandoned FFP completely: a situation may arise where my engine renders the scene in really low resolution, then stretches the result using FFP. I consider that better alternative to not running.

Either use external profiling tools [...] or at least measure the average frame latency
Ok, I really need to get to that benchmarking. Now curiosity is gnawing at me.

(not frame rate)
Yesss, measuring FPS is the primary noob marker.

"Swap Buffers" has to wait for all GPU work to finish before swapping surfaces, which is why you feel it taking most of the time.
That, yes, but this shows that function calls themselves are (more often that not) negligible in the grand scheme of things.

I suggest we return to this when I have my tool set working again.