Hi, I'm TheEvent from
this post. I'm a really impatient person, so when I didn't feel that anyone was listening I simply rage quit. Ironically that made Laurent look somewhat insane, repeatedly speaking to himself, I apologize for that.
Anyway, I have a long history with this exact issue. I'll try to write up a summary of my experiences, but it won't be short.
I first noticed this way back when XNA 2.0 was just released. My very crude game didn't flow smoothly. I graphed the FPS which confirmed my observations, but at that time I eventually blamed Microsoft and moved on to open source. I then stepped up my game by shooting my self repeatedly in my foot by using C++ and Ogre3D. Again I noticed some irregular FPS. I did some troubleshooting, but because of my custom game loop and DirectX/OpenGL cross platform, debugging was hard for me.
At one point I got tired of 3D because of the insane time sink related to content creation. I started researching pretty 2D games with deferred rendering. I did some prototyping in XNA 4.0, with poor performance, but continued refining the techniques with SFML. What can I say, I'm stupid enough to obsess about premature optimization.
So, the problem actually consists of several more complicated issues. The sudden decrease in draw time after 1400 frames are probably due to nVidia, but can be further provoked though SFML.
More specifically, nVidia has a well hidden and sparely documented feature called threaded optimization. On computers with multiple CPU's, which is most gaming computers today, the nVidia drivers can offload some processing to another CPU core. You should think that disabling this feature will fix the problem, or even disable hyper threading, but that actually just makes matters worse. This feature actually significantly increases when enabled, but does so in a staged manner. As the image above shows, the draw time significantly decreases at one point, but in rare occurrences it may actually increase again.
There are two interesting observations between these two or three stages. First of which is the CPU usage. On a dual core CPU, the load% will usually immediately fall from 100% to 50%. Freeing up almost an entire core, even though the FPS increases. However, this stage is not the same as disabling the threaded optimization feature. Which brings me to this picture.
This is a cut out from Intel Vtune which shows cross thread communication between threads, where the first one represents my game, and the second one is the nVidia OpenGL driver nvoglv32.dll. I've used a lot of shaders and FBO's in my game, and upon further investigation the OpenGL calls which begin with "glGet" actually requests information from the GPU. This messes up with the timing, because the GPU's are usually designed and optimized for one way communication. This causes the SFML library to busy wait upon calling such methods such as sf::Shader::setParameter and sf::Texture::update. I've modified my local copy of SFML to work around this problem, as shown in this
github pull request. There is an open similar issue
here.
So SFML is partly to blame for the issues, but I also have two good reasons why not to blame SFML. I've tried directly and exclusively using OpenGL, where some, not all, of the issues remain. In addition, while I don't have any images from ATI hardware, the issue still remains there. The ATI eqivalent atioglxx.dll have similar behaviour to that of nVidia, although nowhere near as bad. To be completely safe, I have this little trick somewhere in my copy of the SFML source code, which forces the event to trigger 99% of the times I start my game.
GLint maxunits;
for (int i = 0; i < 10000; ++i)
glCheck(glGetIntegerv(GL_MAX_TEXTURE_COORDS_ARB, &maxUnits));
At this point I was satisfied with the performance, until I started to get really bothered by two related issues. So the second one is that, once the workload of the GPU surpasses that of the CPU, the benefits of the nVidia threaded optimization suddenly disappears and the FPS tanks a lot. Now there isn't really a lot I can do with this, but it causes some irregular behaviour when you're floating around the point of even GPU/CPU workload.
This brings me to the last issue that I'm aware of, the random, but regular spikes. I noted this issue when I first posted my last post as shown in this image.
At that point, I hadn't paid too much attention to that, but now it's really becoming an issue for me. I currently play around with a lot of FBO's. While my FPS is somewhat stable 300+, I get these spikes of 10-40 milliseconds. That's around 30 FPS, which is really noticeable. Now today I just traced this back to it's source, which is why Google found this thread for me. It has to do with the glContext. As you may know, each FBO have their own sf::Context. Upon calling sf::RenderTarget::clear or draw, eventually you'll get down to WglContext::makeCurrent and wglMakeCurrent, which is the root cause of the huge lag spikes. There could be other causes as well, but I've yet to work around this issue. If there is any way to implement an FBO without using additional Contexts, I'd like to know.