Author Topic: Handling SetActive efficiently in multithreaded environments (Read 7132 times)

Wizzard · « **on:** April 16, 2011, 12:58:18 pm »

I have a game engine I'm writing with SFML that sets up an interface for the Lua scripting language to provide game logic. I've set up functions for the scripting language that directly map to OpenGL calls. The problem is that sometimes the scripting language executes inside of threads invisibly to the scripting language. So, sometimes calls to OpenGL would be made without an OpenGL context. To remedy this, I internally lock/unlock a mutex and call SetActive(true)/SetActive(false) on the window each time I run any scripting language callback from any thread. However, SetActive is too expensive a function to call as frequently as I am (~200 times per second while idling). It's maxing out my 2.13GHz dual core CPU usage to 100%. When commented out on Windows XP, my CPU usage goes down to 1-12%.

A few ideas have ran past my head on how to fix this performance issue:

1.) Reduce the amount of times I call SetActive
Instead of doing SetActive(true)/SetActive(false) each time I run a callback from the scripting language, have the current thread check if it is the window active thread and, if it is not, make a request for the current window active thread to release the window so we can set it active for the current thread. All threads will call a polling function that checks if they need to release the window. This should reduce the amount of calls to SetActive drastically, especially while idling, but it still won't be at an acceptable level in complex scripts that run many different-threaded scripting callbacks every second. It can also create a lot of wait time for threads to finally read the request of other threads.

2.) Drop threading completely (or at least the invisible non-scripted threading)
This really simplifies things. I actually don't know why threads are such a hot topic. The only thing I know of that can't be done smoothly without threads is loading resources while displaying something else. There is also a hack in my application that puts input into a different thread so that the window still performs display updates while being dragged on Windows, but I can live without that.

3.) Stop calling SetActive(true)/SetActive(false) at all
Instead, create a new OpenGL context for every thread with the color buffer starting/clearing color set to be completely transparent and, when the thread finishes running a scripting callback, display the result on top of the window's (the real) OpenGL context and clear the color buffer of the thread's OpenGL context.

I'm just looking for some feedback on my problem. I've been thinking about it all day and am still not sure if I am approaching it the right way. I'll probably go attempt to implement option 3 when I get some motivation to see how it will work. I'd be completely fine with doing option 2 though. This whole thing is sort of low priority since this is just an optimization thing.

P.S. It's 4AM so I'm not sure if this post makes complete sense. :lol:

Groogy · « **Reply #1 on:** April 16, 2011, 03:51:30 pm »

Quote from: "Wizzard"

However, SetActive is too expensive a function to call as frequently as I am (~200 times per second while idling). It's maxing out my 2.13GHz dual core CPU usage to 100%. When commented out on Windows XP, my CPU usage goes down to 1-12%.

It's not SetActive that does that, Laurent does a call to SetActive every time you tell a render target to draw something(I think?). Having maxed out to 100% on a CPU isn't necessary a bad thing. I can max out the CPU without having any advanced graphics or doing any heavy operations at all. Why the CPU usage goes down is because somewhere somehow your application Sleeps.

Quote from: "Wizzard"

2.) Drop threading completely (or at least the invisible non-scripted threading)
This really simplifies things. I actually don't know why threads are such a hot topic. The only thing I know of that can't be done smoothly without threads is loading resources while displaying something else. There is also a hack in my application that puts input into a different thread so that the window still performs display updates while being dragged on Windows, but I can live without that.

If you don't need it, don't use it. If you don't know how to use it, absolutely don't use it. Experiment first before moving it into a game engine.

Quote from: "Wizzard"

I'm just looking for some feedback on my problem. I've been thinking about it all day and am still not sure if I am approaching it the right way. I'll probably go attempt to implement option 3 when I get some motivation to see how it will work. I'd be completely fine with doing option 2 though. This whole thing is sort of low priority since this is just an optimization thing.

Premature optimization is the root of all evils!

Laurent · « **Reply #2 on:** April 16, 2011, 05:18:55 pm »

Quote

t's not SetActive that does that, Laurent does a call to SetActive every time you tell a render target to draw something(I think?)

Not exactly. Switching the current OpenGL context is indeed a very expensive operation, and the only reason why I can usually avoid this in SFML is because I don't activate a context if it's already active (I never call SetActive(false) for windows, so basically the context never changes).

My opinion is that you should drop threads. They always make things complicated especially when OpenGL rendering is involved. So if you don't need them... drop them

Wizzard · « **Reply #3 on:** April 17, 2011, 10:37:29 am »

I'll just remove all threads from the C++ side of my game engine. It may be naive for me to say, but I do feel like I understand threads. It's just that I think I'm losing more performance than I am gaining by using threads. I want my game engine to run fast on multiple core CPUs and single core CPUs as well. Single core CPUs shouldn't have to suffer to make multiple core CPUs faster. Managing SetActive as well as all the synchronization I was already doing just puts it at a point where I'd rather cut my losses and settle on not using threads.

Quote from: "Groogy"

Having maxed out to 100% on a CPU isn't necessary a bad thing. I can max out the CPU without having any advanced graphics or doing any heavy operations at all. Why the CPU usage goes down is because somewhere somehow your application Sleeps.

My CPU usage measuring was done when the application was idling. It was calling no more than it needed to display a few hundred textured rectangles on the screen. If I want my game engine to be compatible with as old of computers as I can possibly can handle, I feel that I need that basic loop to not put my PC that's way above the specs I'm aiming for at a constant 100% CPU usage.

Quote from: "Laurent"

Not exactly. Switching the current OpenGL context is indeed a very expensive operation, and the only reason why I can usually avoid this in SFML is because I don't activate a context if it's already active (I never call SetActive(false) for windows, so basically the context never changes).

What is the point of calling SetActive(true) in that case at all then? I wasn't aware the SFML graphics library did anything to the OpenGL context. I thought it was all left up to the user. I assume that this has something to do with ensuring there is a valid OpenGL context, and if there isn't, you create one?

I'm interested in this because if I make a scripting side implementation of threads, I want textures to be able to be loaded from those. I remember reading somewhere that all SFML OpenGL contexts share resources such as textures. If this is true, all I need to do is give the thread an OpenGL context and then it will be able to load textures, right?

Thanks for the responses Groogy, Laurent!

Laurent · « **Reply #4 on:** April 17, 2011, 11:35:40 am »

Quote

What is the point of calling SetActive(true) in that case at all then?

You can't expect a context to be already active. You might have multiple render targets (windows or render images), or you might have deactivated your window manually, etc.
What I said was only describing the simplest situation where a user has a single window in a single thread.

Quote

I assume that this has something to do with ensuring there is a valid OpenGL context, and if there isn't, you create one?

This is indeed what I do.

Quote

I'm interested in this because if I make a scripting side implementation of threads, I want textures to be able to be loaded from those. I remember reading somewhere that all SFML OpenGL contexts share resources such as textures. If this is true, all I need to do is give the thread an OpenGL context and then it will be able to load textures, right?

Correct

Groogy · « **Reply #5 on:** April 17, 2011, 04:45:06 pm »

Erhm, when I get to school I think I can have a solution for you that we went trough if you still want it to be multi threaded and also be compatible with older platforms. It's quite advanced and will require you to rewrite your whole structure but it is exactly what you are looking for. It will also handle the problem with SetActive.

What you do is divide everything up to different jobs with different dependencies. So running your "scripts" will be one job for instance.

(Ignoring the job solution) Also the whole thing with scripts having direct access to rendering context is not an appropriate way to go. What you want to do is the script thread to generate output that is sent to whatever thread is handling the rendering. I have it kind of described here and can give you the source if you want: http://sfml-dev.org/forum/viewtopic.php?p=29514#29514 (Minus the Create-Init pattern, removed it as it just added unnecessary complexity)

Wizzard · « **Reply #6 on:** April 18, 2011, 06:16:53 am »

That sounds pretty interesting Groogy, but I've decided to keep it simple, at least for now, and just get rid of threads on the C++ side for my game engine.

Quote from: "Groogy"

Also the whole thing with scripts having direct access to rendering context is not an appropriate way to go.

They don't have direct access to a rendering context. In fact, threads that are ran from the scripting language will not be able to draw anything to the screen in my current concept of how things will work. I just want textures to be able to be loaded from threads. So, to achieve that, I'll make a call to SetActive(true) to see if a thread has an OpenGL context and create a valid OpenGL context for threads that don't have one when attempting to load textures.

Quote from: "Groogy"

Erhm, when I get to school I think I can have a solution for you that we went trough if you still want it to be multi threaded and also be compatible with older platforms. It's quite advanced and will require you to rewrite your whole structure but it is exactly what you are looking for. It will also handle the problem with SetActive.

What you do is divide everything up to different jobs with different dependencies. So running your "scripts" will be one job for instance.

... What you want to do is the [script] thread to generate output that is sent to whatever thread is handling the rendering...

For closure, since this forum thread is named "Handling SetActive efficiently in multithreaded environments" and I'm still interested (but don't plan on implementing it), I would like to understand what you're saying. This implies that I would want threads to be able to draw to the screen. Is there a way to, without hindering single core processors, forward OpenGL commands to the main thread?

The following code is how I'd do it with hindering single core processors through lock/unlock (SFML-2.0). It's quite a lot, but I wanted to sacrifice shortness for realism. It basically just sets up a class that can queue OpenGL commands and then runs them all in the main thread.

glwindow.cpp

Code: [Select]

////////////////////////////////////////////////////////////
// Headers
////////////////////////////////////////////////////////////
#include "glwindow.hpp"
#include <SFML/OpenGL.hpp>


////////////////////////////////////////////////////////////
// Color defines a color for the SetColor function
////////////////////////////////////////////////////////////
struct ByteColor
{
    GLubyte r;
    GLubyte g;
    GLubyte b;
};


////////////////////////////////////////////////////////////
// IntRect defines a rectangle for the DrawRectangle function
////////////////////////////////////////////////////////////
struct IntRect
{
    GLint x1;
    GLint y1;
    GLint x2;
    GLint y2;
};


GLWindow::GLWindow() :
    myShouldClose(false)
{
    Initialize();
}


GLWindow::GLWindow(sf::VideoMode mode, const std::string& title, unsigned long style, const sf::ContextSettings& settings) :
    sf::Window(mode, title, style, settings),
    myShouldClose(false)
{
    Initialize();
}


void GLWindow::Clear()
{
    // Add a call to ClearColorBuffer to the renderer calls buffer with no argument
    myRendererCalls.push_back(RendererCall(ClearColorBuffer, 0));
}


void GLWindow::Close()
{
    // We don't directly close the window here because that will make the OpenGL context invalid for any subsequent OpenGL calls
    myShouldClose = true;
}


void GLWindow::Display()
{
    // Lock other threads' writing access to the renderer calls buffer
    myRendererCallsMutex.Lock();

    // Flush buffered renderer calls
    for (RendererCalls::iterator it = myRendererCalls.begin(); it != myRendererCalls.end(); it = myRendererCalls.erase(it))
        switch (it->first)
        {
            case ClearColorBuffer:
            {
                // Clear the OpenGL color buffer
                glClear(GL_COLOR_BUFFER_BIT);

                break;
            }
            case Color:
            {
                // Get ByteColor argument
                ByteColor* color = static_cast<ByteColor*>(it->second);

                // Set the current color with the given arguments
                glColor3ub(color->r, color->g, color->b);

                // Deallocate the ByteColor argument
                delete color;

                break;
            }
            case Rect:
            {
                // Get IntRect argument
                IntRect* rect = static_cast<IntRect*>(it->second);

                // Draw a rectangle with the given arguments
                glRecti(rect->x1, rect->y1, rect->x2, rect->y2);

                // Deallocate the IntRect argument
                delete rect;

                break;
            }
        }

    // Unlock other threads' writing access to the renderer calls buffer when we're done
    myRendererCallsMutex.Unlock();

    // Call the underlying Display function
    sf::Window::Display();

    // If the window should close, call the underlying Close function
    if (myShouldClose)
        sf::Window::Close();
}


void GLWindow::DrawRectangle(int x1, int y1, int x2, int y2)
{
    // Allocate an IntRect with the given coordinates
    IntRect* rect = new IntRect;
    rect->x1 = x1;
    rect->y1 = y1;
    rect->x2 = x2;
    rect->y2 = y2;

    // Add a call to Rect to the renderer calls buffer with an IntRect argument
    myRendererCalls.push_back(RendererCall(Rect, rect));
}


void GLWindow::Initialize()
{
    // Initialize any OpenGL render states
    glOrtho(0, GetWidth(), GetHeight(), 0, -1, 1);
}


void GLWindow::Lock()
{
    // Lock other threads' writing access to the renderer calls buffer
    myRendererCallsMutex.Lock();
}


void GLWindow::SetColor(unsigned char r, unsigned char g, unsigned char b)
{
    // Allocate a ByteColor with the given RGB
    ByteColor* color = new ByteColor;
    color->r = r;
    color->g = g;
    color->b = b;

    // Add a call to Color to the renderer calls buffer with a ByteColor argument
    myRendererCalls.push_back(RendererCall(Color, color));
}


void GLWindow::Unlock()
{
    // Unlocks other threads' writing access to the renderer calls buffer
    myRendererCallsMutex.Unlock();
}

glwindow.hpp

Code: [Select]

////////////////////////////////////////////////////////////
// Headers
////////////////////////////////////////////////////////////
#include <SFML/Window.hpp>


////////////////////////////////////////////////////////////
// GLWindow member functions are used instead of directly
// using OpenGL functions to give commands to the hardware
////////////////////////////////////////////////////////////
class GLWindow : public sf::Window
{
public :

    ////////////////////////////////////////////////////////////
    // Default constructor
    ////////////////////////////////////////////////////////////
    GLWindow();

    ////////////////////////////////////////////////////////////
    // Constructs a new GL window
    ////////////////////////////////////////////////////////////
    GLWindow(sf::VideoMode mode, const std::string& title, unsigned long style = sf::Style::Default, const sf::ContextSettings& settings = sf::ContextSettings());

    ////////////////////////////////////////////////////////////
    // Clears all pixels to black
    ////////////////////////////////////////////////////////////
    void Clear();

    ////////////////////////////////////////////////////////////
    // Tells the window that it should close
    ////////////////////////////////////////////////////////////
    void Close();

    ////////////////////////////////////////////////////////////
    // Do the renderer calls that have been buffered so far
    ////////////////////////////////////////////////////////////
    void Display();

    ////////////////////////////////////////////////////////////
    // Draws a rectangle
    ////////////////////////////////////////////////////////////
    void DrawRectangle(int x1, int y1, int x2, int y2);

    ////////////////////////////////////////////////////////////
    // Perform renderer initializations
    ////////////////////////////////////////////////////////////
    void Initialize();

    ////////////////////////////////////////////////////////////
    // Locks the renderer buffer
    ////////////////////////////////////////////////////////////
    void Lock();

    ////////////////////////////////////////////////////////////
    // Sets the color of the renderer
    ////////////////////////////////////////////////////////////
    void SetColor(unsigned char r, unsigned char g, unsigned char b);

    ////////////////////////////////////////////////////////////
    // Unlocks the renderer buffer
    ////////////////////////////////////////////////////////////
    void Unlock();

private :

    ////////////////////////////////////////////////////////////
    // Enumeration of the different rendering functions the
    // GLWindow is capable of
    ////////////////////////////////////////////////////////////
    enum RendererFunction
    {
        ClearColorBuffer,
        Color,
        Rect
    };

    typedef std::pair<RendererFunction, void*> RendererCall;
    typedef std::vector<RendererCall>          RendererCalls;

    RendererCalls myRendererCalls;      // Container of buffered renderer calls to be processed by the GLWindow
    sf::Mutex     myRendererCallsMutex; // Mutex for synchronizing threads' writing to the renderer calls buffer
    bool          myShouldClose;        // If window should close, this variable is true, else false
};

main.cpp

Code: [Select]

#include "glwindow.hpp"


bool threadShouldRun = true;


void Thread(GLWindow* window)
{
    while (threadShouldRun)
    {
        // Draw an expanding red rectangle in the top left in this thread
        static int Width  = 0;
        static int Height = 0;
        ++Width;
        ++Height;
        window->Lock();
        window->SetColor(255, 0, 0);
        window->DrawRectangle(0, 0, Width, Height);
        window->Unlock();

        sf::Sleep(0.01);
    }
}


int main()
{
    GLWindow window(sf::VideoMode(640, 480), "Test", sf::Style::Close);
    window.SetFramerateLimit(100);

    sf::Thread thread(&Thread, &window);
    thread.Launch();

    while(window.IsOpened())
    {
        sf::Event event;
        while (window.PollEvent(event))
            if (event.Type == sf::Event::Closed)
                window.Close();

        // Draw an expanding blue rectangle in the bottom right in this thread
        static int Width  = 640;
        static int Height = 480;
        --Width;
        --Height;
        window.Lock();
        window.SetColor(0, 0, 255);
        window.DrawRectangle(640, 480, Width, Height);
        window.Unlock();

        window.Display();
    }

    threadShouldRun = false;

    return 0;
}

It's pretty long. In hindsight, it might have been better to upload the files to a separate location and provide a download link. :lol:

Groogy · « **Reply #7 on:** April 18, 2011, 01:45:42 pm »

Well yeh the Lock and Unlock will lower your framerate. The thing is your pretty close to what you want.

Here's test code for my threading library:

Code: [Select]

class TextMessage : public Threading::Message
{
public:
    TextMessage( const std::string &aString )
    {
        myString = aString;
    }

    const std::string &GetString() const
    {
        return myString;
    }
private:
    std::string myString;
};

class MyProducer : public Threading::Looper
{
private:
    void Frame()
    {
        GetSynchro().GetOutput().GetList()->AddElement( new TextMessage( "Hello world!" ) );
    }
};

class MyConsumer : public Threading::Looper
{
private:
    void Frame()
    {
        const Threading::MessageList *inputList = GetSynchro().GetInput().GetList();
        for( unsigned int index = 0, end = inputList->NumElements(); index < end; index++ )
        {
            const TextMessage *message = inputList->GetElement< TextMessage >( index );
            std::cout << message->GetString() << std::endl;
        }
    }
};

int main()
{
    MyProducer producer;
    MyConsumer consumer;
    producer.Launch();
    consumer.Launch();

    bool running = true;
    sf::Clock clock;
    while( running == true )
    {
        if( producer.GetSynchro().IsFinished() == true && consumer.GetSynchro().IsFinished() == true )
        {
            producer.GetSynchro().SynchWith( consumer.GetSynchro() );
            producer.GetSynchro().StartThread();
            consumer.GetSynchro().StartThread();
        }

        if( clock.GetElapsedTime() > 10 )
        {
            running = false;
            producer.GetSynchro().ExitThread();
            consumer.GetSynchro().ExitThread();
        }
    }

    producer.Terminate();
    consumer.Terminate();
    return 0;
}

This is the first step. This way no thread will be locked out by mutexes. Anything you want to be rendered is transferred using messages when the threads synch with each other(Like I do with text message). Now the thing is that at least one thread will always finish first and have to wait for the other one. This can be partially eliminated using a Task/Job manager. This will of course require a whole make over on how you build your engine but is the best way to achieve parallelism but also support single threaded cores(This is what I'm working on currently).

Do you need me to go more in-depth on this part or do you want me to continue with Tasks/Jobs? I have more details here: http://sfml-dev.org/forum/viewtopic.php?p=29514#29514

Wizzard · « **Reply #8 on:** April 18, 2011, 03:25:29 pm »

To clarify, this is similar to treating threads sort of like blocking sockets, right? I'd process a thread until it needs data from other threads and then make a blocking call to request data from another thread. It seems too similar to mutexes to me. Do you have a compilable implementation I can toy with in order to see the difference? Does this really get a better performance than using mutexes?

Groogy · « **Reply #9 on:** April 18, 2011, 06:04:29 pm »

Yes you do get better performance but you have to read the whole context and not the small pieces. The next step is to create a Job/Task system. In there you can say "This task depends on data from the task blah blah blah" so those tasks have to be run first. But there will be some tasks that don't depend on any other input or their input have already been generated so they can be run and generate their own output which some else job then can receive. Thus the Job/Task managers mission is to assign a job to the available threads. Let's say we have a single core system, then we have 2 threads available which means we can have 2 jobs working at the same time. Why we pick 2 threads is not because of hyper-threading or anything, it's because most jobs won't utilize the core to a 100%. This can easily scale up to any number of cores as long as we separate the jobs from each other as much as possible.

This is what I am currently working on for my library so I don't have any example code.

I am only summarizing very poorly what it really is as I'm in a hurry at the moment.

Note: Why dumping locks and using my way when a thread waits is that a lock can be locked several times a frame meaning several threads waits several times while in mine, they only wait once.

Wizzard · « **Reply #10 on:** April 20, 2011, 10:09:29 am »

How do you make a thread wait for another thread?

In SFML, the only way I can think of is like this

Code: [Select]

while (!threadReady)
    sf::Sleep(0.1);

Anyways, I will just curb the majority of my curiosity until your threading library is complete.

Groogy · « **Reply #11 on:** April 20, 2011, 10:16:21 am »

Quote from: "Wizzard"

How do you make a thread wait for another thread?

In SFML, the only way I can think of is like this
Code: [Select]
while (!threadReady) sf::Sleep(0.1);

Something like that but a little more advanced

Code: [Select]


while (!threadReady)
    sf::Sleep(0);

Is more appropriate(The thread says kind of: I'm done if anyone else wants to work, but if not I'll go again).

Laurent · « **Reply #12 on:** April 20, 2011, 10:23:24 am »

This is not very efficient, a "wait condition" would be much better. Or a semaphore, if the thread waits for resources (messages, commands) to be available.

SFML provides only the minimum threading features, my opinion is that developers should use a real threading library for more comlpex stuff like this. You can't really do something clean or efficient with SFML only.