Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Multithreaded Loader sometimes(?) failing  (Read 2308 times)

0 Members and 1 Guest are viewing this topic.

Theshooter7

  • Newbie
  • *
  • Posts: 21
    • View Profile
Multithreaded Loader sometimes(?) failing
« on: February 28, 2015, 06:13:17 am »
(Wasn't sure where to put this so I put the topic in General)
Hi,

In the game I am working on, I have a series of gamestates, and when a level is supposed to be loaded, a 'loading' state is set that shows and animates some graphics, while a separate thread loads the level.

More often than not, it works perfectly. However, on some occasions, when the loader finishes, the console output is flooded with GL_INVALID_OPERATION errors (related to SFML's Texture.cpp calls). So my guess is that the contexts are not sharing texture names properly. The strange part is that it isn't entirely reproducible; most of the time it works just fine and the level loads without problems, but on some seemingly random times, it will as mentioned, seemingly fail to share them.

I figure it has to be something in my multithreaded function, so here is the code in particular:
(click to show/hide)
Obviously it's a lambda function, which then gets placed into a Thread Pool that queues it up on an open thread.

The game uses the TMX Loader library, creating sf::Texture instances for tiles and such. World::LoadWorld() mostly just passes on the level to said TMX loader, and does some parsing. As can be seen from the code, the thread is definitely running because the Gamestate will not be changed until the loader has finished (or returns some error). In short, there shouldn't be anything out of the ordinary...right?

[EDIT] As I've read around a bit and see that it could be relevant, I'm running on Windows 7 Home Premium 64-bit, using SFML 2.2.
« Last Edit: February 28, 2015, 06:14:49 am by Theshooter7 »

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
Re: Multithreaded Loader sometimes(?) failing
« Reply #1 on: February 28, 2015, 09:05:58 am »
1. It would be nice to know exactly where the GL_INVALID_OPERATION comes from. SFML tells you the exact line number when building in debug.

2. Threads + (it works sometimes and sometimes not) sounds like your typical race condition to me. SFML wasn't designed to be thread safe, and although OpenGL contexts are supposed to enable multi-threading without explicit synchronization, I really wouldn't rely on it. Where do you synchronize? I don't see it in the code you provided.

3. Why are you using glFlush() and glFinish() everywhere? glFlush() makes sense, but only when you tell OpenGL to actually do something prior, which you do in 2 of 3 blocks. glFlush() followed directly by glFinish() makes no sense at all. glFinish() implies glFlush(). glFinish() is evil, try to never have to use it, it will kill any advantage the asynchronous nature of OpenGL provides you. In your application, it will even synchronize with all other contexts (and threads) since it is a GPU synchronization command. If they are there just for debugging then you could have mentioned it too.

I assume that the loader takes a bit to load and parse the files, which is the reason why you threw that part into a thread. It sure won't help with the time it takes OpenGL to do stuff (which happens immediately from our perspective). Did you measure where most of the time is spent? Because if the file loading and parsing doesn't take any time at all, I don't see how throwing all that stuff into a separate thread can help.

My initial guess is that your code relies on "unreliable" cross-context behaviour that even the specification doesn't guarantee. In order to tell what is really going on, we will need much more information than you have provided us. We need to know what contexts elsewhere are doing as well.

If you can't provide more code for certain reasons, try to construct a minimal example that mimics the way your application uses contexts in a multi-threaded way. Run it multiple times, and if the same error occurs you can post that here instead.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Theshooter7

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Multithreaded Loader sometimes(?) failing
« Reply #2 on: February 28, 2015, 06:28:15 pm »
Yes, sorry about that, my post was heavily rushed as it was getting really late for me. At least I know someone is reading. :p

1. It would be nice to know exactly where the GL_INVALID_OPERATION comes from. SFML tells you the exact line number when building in debug.
Texture.cpp, lines 490, 516, 517, and 520. Looking at the source, these are when calls to glBindTexture, glMatrixMode, and glLoadMatrixf are made.

2. Threads + (it works sometimes and sometimes not) sounds like your typical race condition to me. SFML wasn't designed to be thread safe, and although OpenGL contexts are supposed to enable multi-threading without explicit synchronization, I really wouldn't rely on it. Where do you synchronize? I don't see it in the code you provided.
Yeah, definitely a typical data race. The thing that confuses me the most however is that (as I tried to explain in my initial post, but failed horribly), the thread has practically exclusive access to the data; other than the use of the sf::RenderTexture to draw the animated loading screen stuff (which was already loaded before the thread is even launched), the main thread and the loader thread do not interact with the same data, because the main thread is more-or-less doing nothing until the loader thread signals it is done (by calling Gamestate::Notify()).

As for synchronization, the threads are never synchronized (i.e. calling join()); after completing their task they are put to sleep until notified to run another task. This is, at its core, how a thread pool works. Do you think adding synchronization could have an effect, seeing as how SFML is not particularly thread-safe as you mentioned?

3. Why are you using glFlush() and glFinish() everywhere? glFlush() makes sense, but only when you tell OpenGL to actually do something prior, which you do in 2 of 3 blocks. glFlush() followed directly by glFinish() makes no sense at all. glFinish() implies glFlush(). glFinish() is evil, try to never have to use it, it will kill any advantage the asynchronous nature of OpenGL provides you. In your application, it will even synchronize with all other contexts (and threads) since it is a GPU synchronization command. If they are there just for debugging then you could have mentioned it too.
Yeah it was just an attempt at debugging the situation. I had read that glFinish() and/or glFlush() would solve potential issues where a texture wasn't fully uploaded but something tried to access it, causing the data to corrupt and potentially causing these problems. Although I appreciate that insight into the difference of the two.

I assume that the loader takes a bit to load and parse the files, which is the reason why you threw that part into a thread. It sure won't help with the time it takes OpenGL to do stuff (which happens immediately from our perspective). Did you measure where most of the time is spent? Because if the file loading and parsing doesn't take any time at all, I don't see how throwing all that stuff into a separate thread can help.
Relatively speaking, the loader does take some time to complete (relatively being about 3-5 seconds on my machine), on a level that isn't fairly large (84x32 tiles of size 32x32), using a tileset of only 25 different tiles. Because of the time it takes (which will likely scale up with the size and complexity of levels), I placed it in another thread so that a loading screen could be done and animated without being blocked (since the loader will not return until it has finished or failed at some point).

On the other hand, I have not taken exact measurements of where the majority of time is spent, and I could do that. Although my best guess would be in loading the textures in the TMX loader. It seems that the TMX loader actually loads the files into sf::Image before uploading them to an sf::Texture, which is probably helpful in regards to OpenGL, but I doubt that one way or the other it should have an impact here.

My initial guess is that your code relies on "unreliable" cross-context behaviour that even the specification doesn't guarantee. In order to tell what is really going on, we will need much more information than you have provided us. We need to know what contexts elsewhere are doing as well.

If you can't provide more code for certain reasons, try to construct a minimal example that mimics the way your application uses contexts in a multi-threaded way. Run it multiple times, and if the same error occurs you can post that here instead.
It certainly seems to be a context problem. I'll try to come up with some sort of example, although I may have narrowed the problem down slightly. It's a problem with LTBL (and I probably should have mentioned the use of that library in my hastily-written first post :-[). To try to sum things up, this error started happening when I switched over to the use of RenderTextures instead of drawing to the RenderWindow directly (because I wanted to remove the visual distortion I was getting, and found that drawing to a RenderTexture, then applying the texture to a sprite and scaling it up to the size of the window gave pixel-perfect results).

However, LTBL only accepted a RenderWindow as one of its parameters for the LightSystem class. See here:
(click to show/hide)
I figured they both utilized sf::RenderTarget, so changing the parameter to a RenderTexture (and any other references needed) should do the trick.
(click to show/hide)
Then for the parameter, I just pass in the RenderTexture I use that is defined in my App class, using GetRenderTexture(). The class looks like this:
(click to show/hide)
(the extra sf::Context is from some debugging, although I have yet to test removing it actually. I'll give that a shot right now; it is instantiated before the RenderWindow and RenderTexture are, after all. But since Contexts are supposed to be shared by default, this shouldn't affect anything should it?)
I'm unsure if you're familiar with the LTBL source--it uses a lot of raw OpenGL. When stepping through the debugger, this function in particular is where the glErrors are thrown:
(click to show/hide)
The line for m_pWin->setActive(); is something I added for debugging an earlier problem where the LightSystem would only draw a full-white quad across the screen (i.e. nothing was being drawn, or Texturing was disabled or something). I mention this because this is clearly where a context switch happens. Note however that nothing related to the LightSystem is called before the loader finishes.

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
Re: Multithreaded Loader sometimes(?) failing
« Reply #3 on: February 28, 2015, 06:46:49 pm »
You really need to minimize your scenario, there is just too much stuff going on.

Bear in mind that SFML might internally also create/destroy its own contexts at certain times, so relying on the fact that the context doesn't change within an SFML function call is not a good idea. I'm guessing since LTBL was written a fairly long time ago, it hasn't really kept up-to-date with SFML in that regard either. All those raw OpenGL calls in the code snippet you posted could be replaced with SFML VertexArray drawing, probably even resulting in better performance.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Theshooter7

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Multithreaded Loader sometimes(?) failing
« Reply #4 on: March 01, 2015, 09:12:02 am »
I think I figured out the problem, as dumb as the solution may have been. It didn't occur to me that, in my World::LoadWorld() function, the LightSystem is being instantiated. After crawling through the LTBL source, and finally realizing that at that point is where the RenderTextures used for lighting are created, that this is probably indeed causing a context race between the threads.

My solution was to move the creation of the light system and data-parsing stuff out of the loader thread into a separate function so that it can run on the main thread. The post-load parsing takes very little time (if anything noticeable at all) so this is more than acceptable for fixing this problem.

Thank you so much anyway; the things you mentioned did help me figure out the issue eventually. I'm just disappointed that I didn't actually catch this sooner.