Author Topic: [Solved] Update fails in sf::VertexBuffer (Read 3779 times)

GerhardKorken · « **on:** January 25, 2021, 09:13:58 pm »

Hello everyone!

I want to address a problem that I have encountered with sf::VertexBuffer, which I am not able to figure out myself. Maybe one of you has a clue of what is going on. Sorry for the long text, but I want to describe the issue as precise as possible.

Conditions
This is a problem which seems to only occur in a multi-thread setup. In the main thread, a sf::VertexBuffer is drawn to a window inside a render loop. Meanwhile, a second thread manipulates vertices in a vertex array (sf::Vertex*) and updates corresponding memory of the vertex buffer using its update() method. Additionally, in the main thread render loop, the sf::VertexBuffer::update() method is also called, updating the same vertices which the second thread is about to manipulate and update next.

Behavior
Sometimes, the updating of vertices to the sf::VertexBuffer fails in the second thread, if same vertices were updated directly before by the main thread. For example, calling sf::VertexArray::update(array + i, 4, i), which should update four vertices from array at position i in the vertex buffer, usually works fine in the second thread. But if the main thread also calls sf::VertexArray::update(array + i, 4, i) shortly before, the second threads update can fail silently: update() still returns true and no OpenGL error is raised, however no update seems to have actually occured.

Demonstration
Below, I have put together an exemplary code which reproduces the issue.
After sf::RenderWindow and sf::VertexBuffer are created, a grid sized blockResolution x blockResolution of quad primitives is initialized using a sf::Vertex array. A second thread is then launched, which loops through all primitives, changing their color from blue to red, and back to blue after reaching the end, then to red again and so on. Simultaneously, in the main thread render loop, the method Interfere() performs an update to the buffer at position nextVertex, which is also the position at which the second thread will update next. In order to exclude race conditions and to ensure the correct sequence of update() calls, all manipulations on both vertex array and buffer are protected by a std::mutex (although I understand that sf::VertexBuffer::update() is already thread safe by some kind of lock). When running the code, you can update the whole buffer by pressing G. By pressing H, you can toggle the calling of Interfere() in order to observe the difference.

Without Interfere(), the visual output will look something like this:

(click to show/hide)

With Interfere(), you can clearly see the primitives where update() did not work, as the color is not updated. (It's getting worse with higher frame rate because of faster render loop)

(click to show/hide)

Why to use this setup
In case you find the described conditions stupid and far-fetched: They probably are, but I want to achieve a specific functionality. The second thread will be loading geometry from file and put primitives into the lowest free position of the vertex array, one by one, and updating the corresponding memory of the buffer. However, I want to be able to also add and remove primitives to the same vertex array from the main thread. When removing a primitive in the main thread, sf::VertexBuffer::update() has to be called, which works, but this makes the then freed space in the vertex array the lowest free position. Hence, the next adding of a primitive by the second thread can fail, because update() was called shortly before by the main thread. Of course, there are several obvious workarounds or different designs which could achieve the same functionality, but I feel this should work nevertheless.

Do you have any idea what the issue could be? Is it some limitation of the underlying gl functions, or am I just missing something crucial here? I've not read about anyone having a similar problem, therefore I might just be overlooking a stupid error in the code...

Thanks in advance!

EDIT: So, after some more reading, I understand a little better what is happening. Apparently, CPU synchronisation does not in any way guarantee GPU synchronisation (didn't know that). When adding

glFlush();

or

glFinish();

after the Interfere() call, no more problems seem to occur. But is this a legitimate workaround, or does it potentially add large overhead? Sorry, but my knowledge of OpenGL is too shallow to judge $:-\$

#include "SFML/Graphics.hpp"
#include <mutex>


/* Settings */

const int blockResolution = 32;
const int windowResolution = 1024;
const int maxFramerate = 120;

/* Derived constants */

const float blockSize = (float)windowResolution / blockResolution;

const int blockCount = blockResolution * blockResolution;
const int vertexCount = blockCount * 4;

/* Custom vertex colors */

sf::Color colorOn = sf::Color(220, 85, 50);
sf::Color colorOff = sf::Color(60, 120, 200);

/* Global variables, accessible by both threads */

sf::VertexBuffer buffer;
std::mutex bufferMutex;

sf::Vertex* vertices = new sf::Vertex[vertexCount];
int nextVertex = 0;



void InitQuads()
{
    int vertexIndex = 0;

    for (int x = 0; x < blockResolution; x++)
    {
        for (int y = 0; y < blockResolution; y++)
        {
            sf::FloatRect blockRect;

            blockRect.left = x*blockSize;
            blockRect.top = y*blockSize;
            blockRect.width = blockSize;
            blockRect.height = blockSize;

            vertices[vertexIndex+0].position =
                sf::Vector2f(blockRect.left, blockRect.top);

            vertices[vertexIndex+1].position =
                sf::Vector2f(blockRect.left, blockRect.top + blockRect.height);

            vertices[vertexIndex+2].position =
                sf::Vector2f(blockRect.left + blockRect.width, blockRect.top + blockRect.height);

            vertices[vertexIndex+3].position =
                sf::Vector2f(blockRect.left + blockRect.width, blockRect.top);

            vertices[vertexIndex+0].color = colorOff;
            vertices[vertexIndex+1].color = colorOff;
            vertices[vertexIndex+2].color = colorOff;
            vertices[vertexIndex+3].color = colorOff;

            vertexIndex += 4;
        }
    }

    buffer.update(vertices);
}

void ThreadLoop()
{
    bool switchOn = true;

    while (true)
    {
        bufferMutex.lock();

        sf::Color setColor = switchOn ? colorOn : colorOff;

        vertices[nextVertex+0].color = setColor;
        vertices[nextVertex+1].color = setColor;
        vertices[nextVertex+2].color = setColor;
        vertices[nextVertex+3].color = setColor;

        bool success = buffer.update(vertices + nextVertex, 4, nextVertex);
        if (!success)
            std::cout << "Buffer update has failed" << std::endl;

        nextVertex += 4;
        if (nextVertex == vertexCount) // Start from the beginning and toggle target vertex color
        {
            nextVertex = 0;
            switchOn = !switchOn;
        }

        bufferMutex.unlock();

        sf::sleep(sf::milliseconds(10)); // Pause to create some time between vertex buffer updates
    }
}

void Interfere()
{
    bufferMutex.lock();

    bool success = buffer.update(vertices + nextVertex, 4, nextVertex);
    if (!success)
        std::cout << "Buffer update has failed (main thread)" << std::endl;

    bufferMutex.unlock();
}

int main()
{

    sf::RenderWindow window(
        sf::VideoMode(windowResolution, windowResolution), "");
    window.setFramerateLimit(maxFramerate);

    buffer.create(vertexCount);
    buffer.setPrimitiveType(sf::Quads);
    buffer.setUsage(sf::VertexBuffer::Dynamic);


    InitQuads(); // Create grid of blockResolution^2 quad primitives

    sf::Thread th(ThreadLoop);
    th.launch();

    bool callInterfere = true;


    while (window.isOpen())
    {
        sf::Event event;
        while (window.pollEvent(event))
        {
            if (event.type == sf::Event::Closed)
                window.close();

            if (event.type == sf::Event::KeyPressed)
            {
                if (event.key.code == sf::Keyboard::G) // Key G updates the whole buffer
                {
                    buffer.update(vertices);
                    std::cout << "Buffer updated" << std::endl;
                }

                if (event.key.code == sf::Keyboard::H) // Key H toggles main thread buffer updates
                {
                    callInterfere = !callInterfere;
                    std::cout << "Main thread buffer updates " << callInterfere << std::endl;
                }
            }
        }

        if (callInterfere)
            Interfere();

        /* Draw all primitives to window */

        window.clear();

        bufferMutex.lock();
        window.draw(buffer);
        bufferMutex.unlock();

        window.display();

    }

    th.terminate();

    delete[] vertices;

    return 0;
}
 

(Specs: i7-8700, GTX 1080 with latest driver version 461.09, Asus Prime Z370-P)

binary1248 · « **Reply #1 on:** January 27, 2021, 02:07:31 am »

As you noticed yourself, the CPU synchronization primitives mean close to nothing to the GPU. What you were initially attempting could be seen as "undefined behaviour" in GPU-land.

The problem that not many people realize when they start out with such things is that OpenGL was never meant to be used in any multi-threading scenarios. Back in the 90s when OpenGL came to be, people were happy that they could get something on their screen using their single core processor running their application on a single thread. All these facilities around OpenGL contexts and the like were just a half-baked solution for an already broken programming model. The programming model was so broken that up until today multi-threading with OpenGL can still be considered a dark art and subject to the mood of the driver in any given situation. Because of this, any serious engine e.g. Unreal Engine never even attempted to multi-thread any part of its OpenGL implementation because it would never work properly.

Now you might be wondering: Why does glFinish() or glFlush() seem to fix the problem?

This is the other thing that many beginners in this field seem to misunderstand, glFinish() and glFlush() were never intended to be and will never be synchronization mechanisms.

Again, back in the 90s, when dedicated graphics hardware was more or less limited to privileged (i.e. rich) companies that had to do e.g. CAD work, the idea was that it would be a waste for such hardware to be used only by a single workstation and could be shared by many users (the idea being a little like modern render farms). If I had to guess, the experienced engineers working on the solution got so accustomed to working with mainframes in the prior decades that they thought access to the expensive graphics hardware could be modelled after access to a mainframe as well. This is the reason that up until now, if you read through the OpenGL specification, the GPU is always referred to as a "Server" and your application as the "Client", this can be seen in the naming of some OpenGL functions as well. In more modern APIs the more sensible terms "Device" and "Host" are used instead.

The problem with modelling something using a client-server model is always going to be latency, buffering, head-of-line blocking and all the other problems that one might be more accustomed to in the networking world.

Just like in the networking world, because every packet has some overhead attached to it, any efficient implementation is going to try to group data together to be able to send in bigger chunks at a time. In the world of TCP this is known as "Nagle's Algorithm". The problem comes when there isn't enough data to satisfy the threshold at which a new packet would be sent out. Either you wait for the internal timeout to expire or you force the implementation to send the data out straight away. This is known as flushing the buffer and is more or less what glFlush() was always intended to do.

Now obviously, if you open up connections to a web server on 2 different computers and force them to send their requests as fast as possible, common sense will tell you that that still doesn't guarantee the order in which the computers will receive their responses from the server because you can't predict which request will actually reach the server first due to the unpredictability of the internet. If you replace "computer" with "OpenGL context" and "internet" with "graphics driver" you basically end up with what is going on here. The fact that you are calling glFlush() doesn't have to mean much. It might work or it might not work, but there is never going to be any guarantee.

The only real guarantee you are going to get is by using glFinish(). It goes a step further than glFlush(). glFinish() basically blocks execution of your code until the graphics driver can guarantee that all issued commands have been completed by the GPU. It's like saying you don't do anything else in your life until the web page finishes loading on your screen. Obviously if a single person did this with 2 computers it is obvious that the order in which requests are processed by the web server will be guaranteed. The main downside of glFinish() and also the only reason people need to stay far away from it is that it completely and utterly destroys any performance you might gain from accelerating your rendering using a GPU. I would go so far as to say you might as well render your graphics solely on your CPU if you intend to use glFinish().

So, now you are asking yourself what you can even do in the current situation if glFlush() and glFinish() are obviously not the way to go.

I hate to say it, but your current architecture shows some signs of premature optimization. Because any graphics API is just queueing up commands to be sent to the GPU at a later point, timing the duration spent calling graphics API functions makes little sense. As such I assume that you didn't base your current solution around such data. What does show up in profiling data would be the time spent doing CPU intensive work like loading and decompressing resources from disk. It is these tasks that you should try to off-load to multiple threads if you feel like it.

I must admit, SFML tends to muddy the performance costs of some of its API a bit too well at times making it seem like throwing everything into threads will magically make everything faster.

As a first step, I would really break all the tasks down into CPU-side tasks and GPU-side tasks. The GPU-side tasks will have to be submitted via a single dedicated thread for the reasons above. How you optimize the CPU-side tasks will be up to you.

GerhardKorken · « **Reply #2 on:** January 27, 2021, 03:01:25 pm »

Hi binary, thank you so much for the detailed response and explanation. That was very informative.

Okay, that's a little sad but understandable, given the server/client analogy. Following your suggestions, I will not use glFlush() and instead reconsider the architecture and, at least temporarily, only do the file reading stuff in another thread. Then see if anything can be optimized further in that regard, without pulling GPU related tasks into multi-threading territory.

Author Topic: [Solved] Update fails in sf::VertexBuffer (Read 3779 times)

GerhardKorken

[Solved] Update fails in sf::VertexBuffer

binary1248

Re: Update fails in sf::VertexBuffer

GerhardKorken

Re: Update fails in sf::VertexBuffer