3
« on: January 27, 2021, 02:07:31 am »
As you noticed yourself, the CPU synchronization primitives mean close to nothing to the GPU. What you were initially attempting could be seen as "undefined behaviour" in GPU-land.
The problem that not many people realize when they start out with such things is that OpenGL was never meant to be used in any multi-threading scenarios. Back in the 90s when OpenGL came to be, people were happy that they could get something on their screen using their single core processor running their application on a single thread. All these facilities around OpenGL contexts and the like were just a half-baked solution for an already broken programming model. The programming model was so broken that up until today multi-threading with OpenGL can still be considered a dark art and subject to the mood of the driver in any given situation. Because of this, any serious engine e.g. Unreal Engine never even attempted to multi-thread any part of its OpenGL implementation because it would never work properly.
Now you might be wondering: Why does glFinish() or glFlush() seem to fix the problem?
This is the other thing that many beginners in this field seem to misunderstand, glFinish() and glFlush() were never intended to be and will never be synchronization mechanisms.
Again, back in the 90s, when dedicated graphics hardware was more or less limited to privileged (i.e. rich) companies that had to do e.g. CAD work, the idea was that it would be a waste for such hardware to be used only by a single workstation and could be shared by many users (the idea being a little like modern render farms). If I had to guess, the experienced engineers working on the solution got so accustomed to working with mainframes in the prior decades that they thought access to the expensive graphics hardware could be modelled after access to a mainframe as well. This is the reason that up until now, if you read through the OpenGL specification, the GPU is always referred to as a "Server" and your application as the "Client", this can be seen in the naming of some OpenGL functions as well. In more modern APIs the more sensible terms "Device" and "Host" are used instead.
The problem with modelling something using a client-server model is always going to be latency, buffering, head-of-line blocking and all the other problems that one might be more accustomed to in the networking world.
Just like in the networking world, because every packet has some overhead attached to it, any efficient implementation is going to try to group data together to be able to send in bigger chunks at a time. In the world of TCP this is known as "Nagle's Algorithm". The problem comes when there isn't enough data to satisfy the threshold at which a new packet would be sent out. Either you wait for the internal timeout to expire or you force the implementation to send the data out straight away. This is known as flushing the buffer and is more or less what glFlush() was always intended to do.
Now obviously, if you open up connections to a web server on 2 different computers and force them to send their requests as fast as possible, common sense will tell you that that still doesn't guarantee the order in which the computers will receive their responses from the server because you can't predict which request will actually reach the server first due to the unpredictability of the internet. If you replace "computer" with "OpenGL context" and "internet" with "graphics driver" you basically end up with what is going on here. The fact that you are calling glFlush() doesn't have to mean much. It might work or it might not work, but there is never going to be any guarantee.
The only real guarantee you are going to get is by using glFinish(). It goes a step further than glFlush(). glFinish() basically blocks execution of your code until the graphics driver can guarantee that all issued commands have been completed by the GPU. It's like saying you don't do anything else in your life until the web page finishes loading on your screen. Obviously if a single person did this with 2 computers it is obvious that the order in which requests are processed by the web server will be guaranteed. The main downside of glFinish() and also the only reason people need to stay far away from it is that it completely and utterly destroys any performance you might gain from accelerating your rendering using a GPU. I would go so far as to say you might as well render your graphics solely on your CPU if you intend to use glFinish().
So, now you are asking yourself what you can even do in the current situation if glFlush() and glFinish() are obviously not the way to go.
I hate to say it, but your current architecture shows some signs of premature optimization. Because any graphics API is just queueing up commands to be sent to the GPU at a later point, timing the duration spent calling graphics API functions makes little sense. As such I assume that you didn't base your current solution around such data. What does show up in profiling data would be the time spent doing CPU intensive work like loading and decompressing resources from disk. It is these tasks that you should try to off-load to multiple threads if you feel like it.
I must admit, SFML tends to muddy the performance costs of some of its API a bit too well at times making it seem like throwing everything into threads will magically make everything faster.
As a first step, I would really break all the tasks down into CPU-side tasks and GPU-side tasks. The GPU-side tasks will have to be submitted via a single dedicated thread for the reasons above. How you optimize the CPU-side tasks will be up to you.