Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Main thread execution pauses waiting on opengl thread?  (Read 3739 times)

0 Members and 1 Guest are viewing this topic.

omnomasaur

  • Newbie
  • *
  • Posts: 8
    • View Profile
    • http://www.omnomasaur.com
Main thread execution pauses waiting on opengl thread?
« on: September 27, 2014, 01:42:11 am »
I'm currently at my wits end with a problem while trying to optimize/smooth the refresh rate of my game.

The problem is that every once in a while the applications main thread will pause for ~15 milliseconds while waiting on a resource from what I assume is the opengl worker thread (as it is a thread created around when the sf::RenderWindow is, and is not created by my code directly).

In Visual Studio's Concurrency Viewer these pauses looks like this:


Where 5728 Worker Thread is the (I think) opengl thread, 8996 Main Thread is my event loop + render thread, and 2076 Worker Thread is my logic thread. 

The data and stack on the Main Thread looks like this:
Category = Synchronization
Api = WaitForSingleObject
Delay = 13.8911 ms
Unblocked by thread 5728; click 'Unblocking Stack' for details.
kernel32.dll!_WaitForSingleObject@8+0x12
nvoglv32.dl!0x8588ff
nvoglv32.dl!0x851cde
nvoglv32.dl!0x851bcd
opengl32.dll!___DrvSwapBuffers@8+0x37
opengl32.dll!_wglSwapBuffers@4+0x6f
gdi32.dll!_SwapBuffers@4+0x25
sfml-window-2.dll!0x4e24
kami.exe!kami::Client::Draw+0x251 - d:\documents\programming\kami\kami\src\base\client.cpp(718, 718)
kami.exe!kami::Client::Run+0xbb7 - d:\documents\programming\kami\kami\src\base\client.cpp(517, 517)
kami.exe!kami::Game::RunNetGame+0xe3 - d:\documents\programming\kami\kami\src\base\game.cpp(357, 357)
kami.exe!kami::Game::Play+0x12d8 - d:\documents\programming\kami\kami\src\base\game.cpp(324, 324)
kami.exe!_main+0x38b - d:\documents\programming\kami\kami\src\main.cpp(114, 114)
kami.exe!_WinMain@16+0x16

The unblocking stack looks like this:
Thread 8996 was unblocked by thread 5728
The unblocking call stack follows:
ntoskrnl.exe! ?? ::FNODOBFM::`string'+0xb49b
ntoskrnl.exe!NtSetEvent+0x90
ntoskrnl.exe!KiSystemServiceCopyEnd+0x13
wow64cpu.dll!CpupSyscallStub+0x9
wow64cpu.dll!Thunk0Arg+0x5
wow64.dll!RunCpuSimulation+0xa
wow64.dll!Wow64LdrpInitialize+0x42a
ntdll.dll! ?? ::FNODOBFM::`string'+0x6a77
ntdll.dll!LdrInitializeThunk+0xe
ntdll.dll!_NtSetEvent@8+0x15
kernelbase.dll!_SetEvent@4+0x10
nvoglv32.dl!0x8588dc
nvoglv32.dl!0x865742
nvoglv32.dl!0x711504
nvoglv32.dl!0x71136c
nvoglv32.dl!0x85a6cc

From looking at these stacks it seems to me like the problem is somehow related to the SwapBuffers call relying on a resource which can become blocked by the (supposed) opengl thread, but I simply don't know enough about that level of opengl to have any idea what I'm dealing with. 

I have tried disabling nearly all of my own code (the entire logic thread + all of the event handling in the main thread), and still couldn't see any change in these pauses. 

I have tried using both setFramerateLimit() and setVerticalSyncEnabled() to limit the execution of the main thread, hoping that the waits would fall naturally into the non-executing time of the main thread, however both of these make the issue much more frequent and noticable. 

I also tested three of my older SFML projects in SFML 1.6, 2.0, and 2.1 respectively and noticed that they all shared the same problem, even though I had not noticed it before. 

I am currently working on a minimal code example of this issue, but wanted to post this first in case it is a simple problem that I'm just missing for some reason. 

Edit:
I have created two very minimal code examples which I have found to cause this issue, one with threaded logic and one without. 
The threaded logic example causes the issue more frequently. 

Without threaded logic:
void main()
{
        sf::Vector2i screenSize(1280, 720);
        sf::CircleShape circle;
        sf::RenderWindow window;
        window.create(sf::VideoMode(screenSize.x, screenSize.y), "test");

        circle = sf::CircleShape(40);
        circle.setOrigin(20, 20);
        circle.setFillColor(sf::Color::Black);
        circle.setOutlineColor(sf::Color::White);
        circle.setOutlineThickness(2.0f);
        float circlePosMod = 0.0f, circleDistance = 200.0f, timestepMod = 2.0f;

        sf::Clock updateClock;

    while (window.isOpen())
    {
        sf::Event event;
        while (window.pollEvent(event))
        {
            if (event.type == sf::Event::Closed)
                window.close();
                }
                //Update start
                float updateTime = updateClock.restart().asMicroseconds() / 1000000.0f;
                circlePosMod += updateTime * timestepMod;

                circle.setPosition((cos(circlePosMod) * circleDistance) + (float)screenSize.x / 2.0f, (sin(circlePosMod) * circleDistance) + (float)screenSize.y / 2.0f);
                //Update end

                //Draw start
                window.clear();

                //Draw stuff here
                window.draw(circle);

                window.display();
                //Draw end
    }
}

With threaded logic:
sf::Vector2i screenSize(1280, 720);
sf::CircleShape circle;
bool endLogic = false;

void logicThread()
{
        circle = sf::CircleShape(40);
        circle.setOrigin(20, 20);
        circle.setFillColor(sf::Color::Black);
        circle.setOutlineColor(sf::Color::White);
        circle.setOutlineThickness(2.0f);
        float circlePosMod = 0.0f, circleDistance = 200.0f, timestepMod = 2.0f;

        sf::Clock updateClock;

        while (!endLogic)
        {
                //Update start
                float updateTime = updateClock.restart().asMicroseconds() / 1000000.0f;
                circlePosMod += updateTime * timestepMod;

                circle.setPosition((cos(circlePosMod) * circleDistance) + (float)screenSize.x / 2.0f, (sin(circlePosMod) * circleDistance) + (float)screenSize.y / 2.0f);
                //Update end

                //Don't lock up the machine
                sf::sleep(sf::microseconds(500));
        }
}

void main()
{
        sf::RenderWindow window;
        window.create(sf::VideoMode(screenSize.x, screenSize.y), "test");
        //window.setVerticalSyncEnabled(true);

        endLogic = false;
        sf::Thread thread(&logicThread);
        thread.launch();

        while (window.isOpen())
        {
                sf::Event event;
                while (window.pollEvent(event))
                {
                        if (event.type == sf::Event::Closed)
                                window.close();
                }

                //Draw start
                window.clear();

                //Draw stuff here
                window.draw(circle);

                window.display();
                //Draw end
        }

        endLogic = true;
        thread.wait();
}
« Last Edit: September 27, 2014, 02:41:27 am by omnomasaur »

wintertime

  • Sr. Member
  • ****
  • Posts: 255
    • View Profile
Re: Main thread execution pauses waiting on opengl thread?
« Reply #1 on: September 28, 2014, 12:51:24 am »
It looks suspicious to me that you access the circle and the bool from 2 threads without a mutex.
Though if using threads, I would use the main thread only for event handling anyway and move clear, draw, display to the other thread where you access the drawn objects.

omnomasaur

  • Newbie
  • *
  • Posts: 8
    • View Profile
    • http://www.omnomasaur.com
Re: Main thread execution pauses waiting on opengl thread?
« Reply #2 on: September 28, 2014, 04:25:12 am »
As far as the lack of thread safety goes, I can see how that would be a problem, but it doesn't really explain why I still get issues without multithreading. 

For the second suggestion, I honestly don't even see how that would be possible without having the threads block each other when accessing the window. 

After a bit more testing I am actually completely unable to create any attempt at smooth motion whatsoever without experiencing this issue, even ignoring events entirely and only drawing doesn't help.  I even tried SDL 2.0 and ran into the same thing when using hardware accelerated rendering.  At this point I am convinced that this isn't some sort of bug, but rather a regular thing which I just don't seem to understand how to work around. 

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 10826
    • View Profile
    • development blog
    • Email
AW: Main thread execution pauses waiting on opengl thread?
« Reply #3 on: September 28, 2014, 08:59:00 am »
Using mulithreading for separating update and rendering is rarely a useful approach. One should only start thinking about multithreaded applications, once the project is complex enoug, the overhead if multi-threading will most likely have a negative effect for a simple tilemap game or similar, and once you know all the in and outs of parallel programming, including problems surrounding shared memory, race conditions, locks, mutexes, etc.

As for smooth rendering, have you tried implementing a fixed timestep (including proper interpolation)?

You could also try to give your process a higher priority, so you can make "sure" that the OS isn't pausing the thread to let another process do stuff.

Did you run a profiler to see where in the code time is lost?
Official FAQ: https://www.sfml-dev.org/faq.php
Official Discord Server: https://discord.gg/nr4X7Fh
——————————————————————
Dev Blog: https://duerrenberger.dev/blog/

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
Re: Main thread execution pauses waiting on opengl thread?
« Reply #4 on: September 28, 2014, 12:16:25 pm »
You know... since this is happening inside the driver thread, this isn't directly related to SFML.

There are a few things you should know:
  • Since (you think) you didn't enable vertical synchronization, the driver shouldn't be pausing for the vertical refresh. Double check both your code and your Nvidia control panel settings to make sure it is disabled.
  • Since the only wait is inside the driver, this cannot be caused by SFML's framerate limiter either.
  • Nvidia has a tradition of providing settings that nobody really understands, including the infamous "threaded optimization". Disable it unless you know exactly what it does.
And lastly and most importantly: The driver rightfully makes the application wait if it is already X number of frames ahead of the GPU. It is not uncommon to measure the GPU being 2-3 frames behind the CPU, and that is also the sweet spot in regards to potential GPU-driver optimization. If your GPU has more work to do than the CPU, and the driver didn't limit anything, the ratio would go out of control and the GPU would end up being seconds behind the CPU's state of things, with an enormous command buffer size. I think it is obvious that this isn't desirable. I am almost sure this is the case, because if you haven't noticed, this pause doesn't just occur "once in a while", it occurs at exactly the same time every second from what I can tell by looking at the concurrency viewer screenshot.

Like eXpl0it3r already said, this isn't the cause of "unsmooth" FPS. Most of the time, people expect their applications to run at exactly the same framerate the whole way. This is just unrealistic and can probably only be achieved in systems designed for real-time computing. You are already pretty lucky that the driver is doing a somewhat good job at this, and judging by the concurrency viewer, your application isn't necessarily doing anything too complicated either. If you haven't already done so, I would focus on writing more code that actually puts some real load on both CPU and GPU before worrying about smoothing out the FPS. It is easy to call a 15ms pause out as a spike when your application is running at over 1000 FPS, whereas at 60FPS, it probably won't even be noticeable.
« Last Edit: September 28, 2014, 12:18:31 pm by binary1248 »
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

omnomasaur

  • Newbie
  • *
  • Posts: 8
    • View Profile
    • http://www.omnomasaur.com
Re: AW: Main thread execution pauses waiting on opengl thread?
« Reply #5 on: October 02, 2014, 03:41:32 am »
tl;dr: Attempted to address all suggestions.  Discovered the problem only exists in windowed mode (fullscreen is fine).  Still working on windowed though. 

Using mulithreading for separating update and rendering is rarely a useful approach. One should only start thinking about multithreaded applications, once the project is complex enoug, the overhead if multi-threading will most likely have a negative effect for a simple tilemap game or similar, and once you know all the in and outs of parallel programming, including problems surrounding shared memory, race conditions, locks, mutexes, etc.
To be honest the reason I separated the logic and rendering into threads in the first place, was an earlier attempt to alleviate a visually similar issue.  At the time I wasn't properly handling garbage collection on my lua scripts which caused them to eat up processor time and block execution, but I didn't realize that was the issue.  Multithreading helped because it made these problems not affect the render thread.  I have since fixed the garbage collection.  Unfortunately now the game logic behaves strangely when I did a test of using only the main thread(as I was typing I realized what caused this and fixed it), but this is definitely something I have been, and continue to be wary of. 

As for smooth rendering, have you tried implementing a fixed timestep (including proper interpolation)?
I have of course read the famous article and implemented a nearly identical solution in my project, though I have noticed that with multithreading the iterpolation is only performed based on logical updates and doesn't take rendering into account at all.  I have been testing to determine whether this could cause the issues or not, but have yet to come to a real conclusion. 

You could also try to give your process a higher priority, so you can make "sure" that the OS isn't pausing the thread to let another process do stuff.
Higher priority doesn't help at all... though I have discovered something along these lines with windowed/fullscreen which I have put at the end of this post. 

Did you run a profiler to see where in the code time is lost?
Nearly all of the time is spent in rendering, specifically the nvidia driver nvoglv32.dll.
Filtered by "Just My Code" it's mostly sfml-graphics-2.dll. (probably sf::RenderWindow->display())

You know... since this is happening inside the driver thread, this isn't directly related to SFML.
I figured this was true as the problem persisted when I made a simple test for it in SDL. 

  • Since (you think) you didn't enable vertical synchronization, the driver shouldn't be pausing for the vertical refresh. Double check both your code and your Nvidia control panel settings to make sure it is disabled.
  • Nvidia has a tradition of providing settings that nobody really understands, including the infamous "threaded optimization". Disable it unless you know exactly what it does.
Both of these are definitely disabled. 

And lastly and most importantly: The driver rightfully makes the application wait if it is already X number of frames ahead of the GPU. It is not uncommon to measure the GPU being 2-3 frames behind the CPU, and that is also the sweet spot in regards to potential GPU-driver optimization. If your GPU has more work to do than the CPU, and the driver didn't limit anything, the ratio would go out of control and the GPU would end up being seconds behind the CPU's state of things, with an enormous command buffer size. I think it is obvious that this isn't desirable. I am almost sure this is the case, because if you haven't noticed, this pause doesn't just occur "once in a while", it occurs at exactly the same time every second from what I can tell by looking at the concurrency viewer screenshot.
While in the screenshot above the pause does seem to occur exactly one second apart, overall they are much more irregular, sometimes multiple times a second and sometimes not at all for multiple seconds. 

Like eXpl0it3r already said, this isn't the cause of "unsmooth" FPS. Most of the time, people expect their applications to run at exactly the same framerate the whole way. This is just unrealistic and can probably only be achieved in systems designed for real-time computing. You are already pretty lucky that the driver is doing a somewhat good job at this, and judging by the concurrency viewer, your application isn't necessarily doing anything too complicated either. If you haven't already done so, I would focus on writing more code that actually puts some real load on both CPU and GPU before worrying about smoothing out the FPS. It is easy to call a 15ms pause out as a spike when your application is running at over 1000 FPS, whereas at 60FPS, it probably won't even be noticeable.
It is actually quite the opposite and is significantly more noticeable at 60FPS than the >1000FPS the game runs at without vsnc. 


After several hours of testing in both the game proper and test examples (I implemented fixed timesteps for the tests as well), I noticed that this only happens in windowed mode.  When running fullscreen the issue is entirely gone, both with and without vsync.  Comparing fullscreen and windowed results in the concurrency visualizer shows that while fullscreened the driver thread is almost never blocked, unlike in the windowed thread.  Performance analysis also shows that in fullscreen the driver thread also does significantly more work, even while vsync is locking it to only process 60FPS. 

Some of the driver thread blocks while windowed are due to preemption (though these are less common than synchronization), which would indicate that the problem is related to OS scheduling as eXpl0it3r suggested could be possible, though I don't really know what could even be done if that is actually the case, especially considering that increasing process priority didn't help. 

Currently I am somewhat consoled, because at the very least I can simply force the game to only run in fullscreen mode.  I would like to support windowed mode however, so I am continuing to investigate this. 
« Last Edit: October 02, 2014, 03:48:56 am by omnomasaur »