Author Topic: Random spikes in time taken to draw objects (Read 10896 times)

Josh · « **on:** February 17, 2013, 10:56:48 am »

I've been noticing some stuttering in my game, which is kind of annoying. I did some profiling and it seemed to be centered around a section of code dealing with some render textures.

I've been able to form this minimal program:

#include <SFML/Graphics.hpp>
#include <stdio.h>

#define NUMLOOPS 2000

#define WIDTH 1600
#define HEIGHT 900

int main() {

    sf::RenderWindow  window( sf::VideoMode( WIDTH, HEIGHT, 32 ), "Lag Spikes" );

        sf::RenderTexture renderTexture;
        sf::Sprite renderSprite;
        renderTexture.create(WIDTH, HEIGHT);
        renderSprite.setTexture(renderTexture.getTexture());

        sf::VertexArray tileDetails;
        tileDetails.setPrimitiveType(sf::Triangles);

        sf::Clock deltaClock;

        FILE *fp;
        fopen_s(&fp, "timings.xls", "w");
        fprintf_s(fp, "Frame Time\tClear Time\tDraw Time\tDisplay Time\n");
        srand(0);

        float records[NUMLOOPS][4];
        for(int i = 0; i < NUMLOOPS; i++)
        {
                //Frame time for previous frame
                records[i][0] = deltaClock.restart().asSeconds();

                window.clear(sf::Color::Cyan);
                records[i][1] = deltaClock.getElapsedTime().asSeconds();

                //Draw to some render textures and draw them to screen
                for(int j = 0; j < 5; j++)
                {
                        renderTexture.clear(sf::Color::Transparent);
                        renderTexture.draw(tileDetails);
                        renderTexture.display();
                        window.draw(renderSprite);
                }
                records[i][2] = deltaClock.getElapsedTime().asSeconds();

                window.display();
                records[i][3] = deltaClock.getElapsedTime().asSeconds();
        }

        //Write out the times
        for(int i = 0; i < NUMLOOPS; i++)
                fprintf_s(fp, "%f\t%f\t%f\t%f\n", records[i][0], records[i][1], records[i][2] - records[i][1], records[i][3] - records[i][2]);
}
 

Essentially, opens a window, makes a render texture and then draws some data to the render texture. It keeps track of the time taken to clear, draw and display each frame and saves it to an excel file. Opening it up and making a graph shows tonnes of spikes in time taken for the draw step. The loop is there to exaggerate the lag spike. Also, my game requires 5 passes, due to some shaders, so I left it in in case it affects the answer.

Here are some example runs:
My pc:

Friend's pc

As you can see, every frame is dominated by the draw time, with occassional spikes in the display time. But then, sometimes this also happens:

Note from frame 1400 onwards, the display time jumps from almost 0ms to 0.7ms, and the draw time drops from around 1.4ms down to 0.7ms. I've noticed this behaviour in my game as well, where sometimes all the rendering time is reported under my render system, and window.display() happens almost instantaneously, and other times my render system renders faster, while the window.display() takes long enough to fill the gap.

Basically, I'm looking for reasons or explanations why both the spikes and the draw/display effect is happening. I'm guessing it might have something to do with interrupts to the graphics card, or some other hardware side effect that I won't have much control over. But figured I would post and see if anyone has any ideas.

eXpl0it3r · « **Reply #1 on:** February 17, 2013, 11:54:08 am »

If you search the forum a bit, you might find the one post, where someone noticed something similar. In the end IIRC it had to do with glCheck.
I only vaguely remember it, but I don't have time to search it again, should be easy to find though.

Josh · « **Reply #2 on:** February 17, 2013, 11:58:53 am »

Ah cheers. I had a quick look but didn't see anything relevant. I'll have another search

Edit: I can't seem to find the post you're talking about. Going to sleep now, will have another look tomorrow.

eXpl0it3r · « **Reply #3 on:** February 19, 2013, 02:32:43 pm »

Sorry I haven't replied again sooner.

Though I've found the thread I had in mind, it seems like the user has deleted his account and mostly all of the important posts (see here). Also he seemed to have a bit a different problem...

What exact version of SFML and what compiler/OS are you using?
Judging from the fprint_s, it's probably Visual Studio, right?

Would be probably best, if you could narrow it down further, with VS this should be quite easy.

Btw. you should definitely learn a bit (modern) C++, because you're code is very C-ish and the 'secure' _s functions are VS specific, so you shouldn't actually use them.

Josh · « **Reply #4 on:** February 19, 2013, 04:54:29 pm »

I'm using SFML 2. I can't remember the exact commit, but my own repository notes say I last updated 5th Dec 2012. Looking at all the commits since then, there shouldn't be anything that would affect this, but I guess I can update it just to see.

I am using VS Express 2008 on Win 7 64 bit.

Narrowing it down? I removed the render texture drawing, so now the main loop JUST clears and displays.

This is my pc, constantly spiking over 1ms just to clear/display:

This is a different friend's pc with drawing on and then off:

So, the spikes, while not as big in the last image (averages around 0.7ms), they are still large, and regular.

Taking out the display and clear calls, but leaving in their timers, produces this:

So there are still spikes there, just not as many (Does same thing on friend's computer).

So, I guess it could be the timer blocking the process for short periods of time causing the spikes, although, maybe not, since the spikes get more prevalent when clearing and displaying. I will take all the timers out of my game and see if the spikes still happen.

Also, cheers for the heads up about _s functions being VS specific. Didn't realise. Was always more comfortable using printf over cout for some reason, so kinda stuck with it, and then moved on _s when the compiler kept recommending them. I should probably spend more time with c++ i/o.

eXpl0it3r · « **Reply #5 on:** February 19, 2013, 05:27:28 pm »

Well we need to be careful here not to mix OS interruptions with SFML/OpenGL related stuff.

At 10000 fps the slights thing can kill the whole performance by a factor of 10 or so, that's not very unusual and should never have any affect on your game. Besides that, everything above 120 fps is imho irrelevant, except for bench-marking.
But having frame drops down to 12 fps (with render texture) is certainly not acceptable and needs further investigation. With 'narrowing it down' I meant to profile deeper into SFML's source, to see which function exactly creates the hold up. E.g. you can take a look at AMD' CodeAnalyst Performance Analyzer, although I'm not sure if it will be able to catch such spikes.

Josh · « **Reply #6 on:** February 20, 2013, 03:20:26 am »

Haha, whoops, didn't notice the last graph's scale was at 0.1ms. Was 3am when I made that post and I had been staring at the numbers for a while. And yeah, when it's that small, probably just regular interruptions. Additionally, I removed timers from my game and it still spiked around.

I'll have a look a look at CodeAnalyst.

Edit: So I ran download CodeAnalyst and updated my system32 symbol cache. Ran the render test program and here is the result:

Process -> Pid -> Module	Timer Samples	Number of Samples
SFMLRenderBenchmark.exe	26.35	1347
PID : 9904	26.35	1347
Unknown Kernel Samples	14.13	722
C:\Windows\SysWOW64\nvoglv32.dll	6.36	325
C:\Windows\SysWOW64\ntdll.dll	2.11	108
C:\Windows\SysWOW64\wow64cpu.dll	1.41	72
C:\Windows\SysWOW64\wow64win.dll	0.7	36
C:\Windows\SysWOW64\wow64.dll	0.55	28
unknown module pid (9904)	0.47	24
C:\Windows\SysWOW64\kernel32.dll	0.27	14
C:\Windows\SysWOW64\opengl32.dll	0.12	6
C:\Windows\SysWOW64\user32.dll	0.12	6
C:\Windows\SysWOW64\guard32.dll	0.08	4
C:\Windows\SysWOW64\imm32.dll	0.02	1
F:\Data\Projects\SFMLRenderBenchmark\sfml-graphics-2.dll	0.02	1

Not sure what the Unknown Kernel Samples are. Are there other symbols I have to have access to?

I also have windbg, gdebugger and xperf installed, but to be honest, I don't really know my way around any of them to do much profiling

LagEvent · « **Reply #7 on:** March 12, 2013, 06:09:11 am »

Hi, I'm TheEvent from this post. I'm a really impatient person, so when I didn't feel that anyone was listening I simply rage quit. Ironically that made Laurent look somewhat insane, repeatedly speaking to himself, I apologize for that.

Anyway, I have a long history with this exact issue. I'll try to write up a summary of my experiences, but it won't be short.

I first noticed this way back when XNA 2.0 was just released. My very crude game didn't flow smoothly. I graphed the FPS which confirmed my observations, but at that time I eventually blamed Microsoft and moved on to open source. I then stepped up my game by shooting my self repeatedly in my foot by using C++ and Ogre3D. Again I noticed some irregular FPS. I did some troubleshooting, but because of my custom game loop and DirectX/OpenGL cross platform, debugging was hard for me.

At one point I got tired of 3D because of the insane time sink related to content creation. I started researching pretty 2D games with deferred rendering. I did some prototyping in XNA 4.0, with poor performance, but continued refining the techniques with SFML. What can I say, I'm stupid enough to obsess about premature optimization.

So, the problem actually consists of several more complicated issues. The sudden decrease in draw time after 1400 frames are probably due to nVidia, but can be further provoked though SFML.

More specifically, nVidia has a well hidden and sparely documented feature called threaded optimization. On computers with multiple CPU's, which is most gaming computers today, the nVidia drivers can offload some processing to another CPU core. You should think that disabling this feature will fix the problem, or even disable hyper threading, but that actually just makes matters worse. This feature actually significantly increases when enabled, but does so in a staged manner. As the image above shows, the draw time significantly decreases at one point, but in rare occurrences it may actually increase again.

There are two interesting observations between these two or three stages. First of which is the CPU usage. On a dual core CPU, the load% will usually immediately fall from 100% to 50%. Freeing up almost an entire core, even though the FPS increases. However, this stage is not the same as disabling the threaded optimization feature. Which brings me to this picture.

This is a cut out from Intel Vtune which shows cross thread communication between threads, where the first one represents my game, and the second one is the nVidia OpenGL driver nvoglv32.dll. I've used a lot of shaders and FBO's in my game, and upon further investigation the OpenGL calls which begin with "glGet" actually requests information from the GPU. This messes up with the timing, because the GPU's are usually designed and optimized for one way communication. This causes the SFML library to busy wait upon calling such methods such as sf::Shader::setParameter and sf::Texture::update. I've modified my local copy of SFML to work around this problem, as shown in this github pull request. There is an open similar issue here.

So SFML is partly to blame for the issues, but I also have two good reasons why not to blame SFML. I've tried directly and exclusively using OpenGL, where some, not all, of the issues remain. In addition, while I don't have any images from ATI hardware, the issue still remains there. The ATI eqivalent atioglxx.dll have similar behaviour to that of nVidia, although nowhere near as bad. To be completely safe, I have this little trick somewhere in my copy of the SFML source code, which forces the event to trigger 99% of the times I start my game.

Code: [Select]

GLint maxunits;
for (int i = 0; i < 10000; ++i)
    glCheck(glGetIntegerv(GL_MAX_TEXTURE_COORDS_ARB, &maxUnits));

At this point I was satisfied with the performance, until I started to get really bothered by two related issues. So the second one is that, once the workload of the GPU surpasses that of the CPU, the benefits of the nVidia threaded optimization suddenly disappears and the FPS tanks a lot. Now there isn't really a lot I can do with this, but it causes some irregular behaviour when you're floating around the point of even GPU/CPU workload.

This brings me to the last issue that I'm aware of, the random, but regular spikes. I noted this issue when I first posted my last post as shown in this image.

At that point, I hadn't paid too much attention to that, but now it's really becoming an issue for me. I currently play around with a lot of FBO's. While my FPS is somewhat stable 300+, I get these spikes of 10-40 milliseconds. That's around 30 FPS, which is really noticeable. Now today I just traced this back to it's source, which is why Google found this thread for me. It has to do with the glContext. As you may know, each FBO have their own sf::Context. Upon calling sf::RenderTarget::clear or draw, eventually you'll get down to WglContext::makeCurrent and wglMakeCurrent, which is the root cause of the huge lag spikes. There could be other causes as well, but I've yet to work around this issue. If there is any way to implement an FBO without using additional Contexts, I'd like to know.

Laurent · « **Reply #8 on:** March 12, 2013, 07:57:21 am »

Thank you very much for the very very very detailed investigation. Unfortunately this is a complex issue, and right now I just can't work on this kind of problem. I haven't read your full post yet (just had a quick look), I'll do it later, but if you find something that could be done in SFML, don't hesitate to fork it and make pull requests.

Quote

Ironically that made Laurent look somewhat insane, repeatedly speaking to himself, I apologize for that.

Josh · « **Reply #9 on:** March 13, 2013, 08:01:33 am »

Quote from: LagEvent on March 12, 2013, 06:09:11 am

Now today I just traced this back to it's source, which is why Google found this thread for me. It has to do with the glContext. As you may know, each FBO have their own sf::Context. Upon calling sf::RenderTarget::clear or draw, eventually you'll get down to WglContext::makeCurrent and wglMakeCurrent, which is the root cause of the huge lag spikes. There could be other causes as well, but I've yet to work around this issue. If there is any way to implement an FBO without using additional Contexts, I'd like to know.

Wow, very in depth investigation. I was having trouble fully understanding various profiling software, so I kind of gave up.

However, before I did give up, I noticed one thing which significantly reduced my lag spikes, in my game. I had been using 5 sf::RenderTextures, and was getting huge spikes (like taking 60-100ms for one frame). I rearranged the way they were used, so that now I can just use a single sf::RenderTexture. I still clear/display that one texture 5 times per frame (More now, in fact, because of other additions to the game), and yet, the spikes are nowhere near as big (From memory, my biggest ones after that were around 20ms, still ridiculous, but significantly better than 100+ms) as when I was using 5 separate textures.

Also, it feels awesome to know that the problem is not just with my game/computer/compiler/etc, and that someone else is experiencing it.

LagEvent · « **Reply #10 on:** March 16, 2013, 08:35:17 pm »

I've been able to confirm that the lag spikes are equally present on AMD hardware. Since it's clearly not hardware dependent, I was about to test out DirectX once again. As I've already said, I also had some similar, but irregular issues back when I was programming in XNA. Since then I've changed to larger monitors, which have resulted in more windowed gaming, including game development.

I took a gamble and put in that sf::Style::FullScreen, and to my honest surprise, the lag spikes was reduced to a fraction of what it once was. In some cases it even disappeared completely. I tried to in windowed mode again, but the spikes came right back, even with the option "disable desktop composition". I was aware you could theoretically get better performance in fullscreen, but this issue is just becoming weirder.

I consider testing it out in linux, but none of my servers have X installed.

eXpl0it3r · « **Reply #11 on:** March 16, 2013, 09:00:26 pm »

Quote from: LagEvent on March 16, 2013, 08:35:17 pm

I took a gamble and put in that sf::Style::FullScreen, and to my honest surprise, the lag spikes was reduced to a fraction of what it once was. In some cases it even disappeared completely. I tried to in windowed mode again, but the spikes came right back, even with the option "disable desktop composition". I was aware you could theoretically get better performance in fullscreen, but this issue is just becoming weirder.

I'm not an expert, but personally it wouldn't be very strange to me. When you're in window mode, you'll have to share the GPU with every application and each of them could theoretically keep the buses busy for a bit.
In fullscreen mode you actual get the full capabilities of the GPU.

Would be intersting if you could test some well-known games (with OpenGL) to see if you get similar issues.

LagEvent · « **Reply #12 on:** March 16, 2013, 10:35:56 pm »

Quote from: eXpl0it3r on March 16, 2013, 09:00:26 pm

In fullscreen mode you actual get the full capabilities of the GPU.

Well, no. SFML seems to run in some sort of windowed fullscreen mode, so I'm able to pull up windows in front of my game. In addition, I have multiple monitors active with Windows Aero, HD video streams and even games running. SFML does not take exclusive control over the video card. I fully understand that there can be a performance increase when taking exclusive control over the video card. I can even get around the idea that you can get better performance in windowed fullscreen. However, I'm not talking about a significant increase in FPS, it's a decrease in some sort of regular resource stalling every second. Hence I'm very interested to see if/how these problems would manifest themselves on linux.

LagEvent · « **Reply #13 on:** March 18, 2013, 07:58:49 am »

Quote

Prior to this extension, it was the window-system which defined and managed this collection of images, traditionally by grouping them into a "drawable". The window-system API's would also provide a function (i.e., wglMakeCurrent, glXMakeCurrent, aglSetDrawable, etc.) to bind a drawable with a GL context (as is done in the WGL_ARB_pbuffer extension). In this extension however, this functionality is subsumed by the GL and the GL provides the function BindFramebufferEXT to bind a framebuffer object to the current context.

Source: https://www.opengl.org/registry/specs/EXT/framebuffer_object.txt

While the old pBuffers did indeed require additional glContexts for each render target, the new FBO's does not require this. This was something I suspected, but was first confirmed to by in this post while looking into multiple render targets for my deferred rendering engine.

I'm thus led to believe that SFML's implementation of FBO's is somewhat flawed. As this excellent tutorial shows, in addition to creating and binding the FBO's upon creating, you need to bind the framebuffer upon drawing. Instead SFML activates another context. I wrote a small test application to test my hypnosis and made some small modifications to SFML.

int main(int argc, char* argv[]){
    sf::Vector2i s(1280,720);
    sf::RenderWindow window(sf::VideoMode(s.x, s.y), "test");
    sf::VertexArray va1; va1.setPrimitiveType(sf::Triangles);
    sf::VertexArray va2; va2.setPrimitiveType(sf::Triangles);
    for (int i = 0; i < 10*3; ++i) va1.append(sf::Vertex(sf::Vector2f(rand()%s.x, rand()%s.y), sf::Color(255,0,255)));
    for (int i = 0; i < 10*3; ++i) va2.append(sf::Vertex(sf::Vector2f(rand()%s.x, rand()%s.y), sf::Color(0,255,0)));
    sf::RenderTexture target; target.create(s.x, s.y);
    while (true){
        sf::Event e; while (window.pollEvent(e));
        sf::Clock c; while (c.getElapsedTime().asMilliseconds() < 16);
        va1[rand()%30].position = sf::Vector2f(rand()%s.x, rand()%s.y);
        va2[rand()%30].position = sf::Vector2f(rand()%s.x, rand()%s.y);
        target.clear(sf::Color(128,128,128));
        target.draw(va1);
        target.display();
        window.clear(sf::Color(255,0,0));
        window.draw(sf::Sprite(target.getTexture()));
        window.draw(va2);
        window.display();
    }
    return 0;
}
 

Does as you would expect by drawing a lot of silly triangles in crazy colors. I then tried to remove the context from the FBO implementation.

@Graphics\RenderTextureImplFBO.cpp
 67    //delete m_context;
 87    //m_context = new Context;
134    return true; //return m_context->setActive(active);
 

But that just broke everything and made the application just print a black screen. I then tried to fill out the rest of the missing code.

@Graphics\RenderTextureImplFBO.cpp
134    glBindFramebuffer(GL_FRAMEBUFFER, m_frameBuffer); return true;
@Graphics\RenderWindow.cpp
 67    glBindFramebuffer(GL_FRAMEBUFFER, 0); return setActive(active);
@Graphics\RenderTarget.cpp
 58    resetGLStates();
 

And magically the program works again as first intended. This seems to drastically stabilize the FPS and reduces the lag spikes, although they are still present. It's ugly, but it seems to work. I can't seem to fully identify the root cause for the issue, but I'll probably look into it once again at a later time.

Laurent · « **Reply #14 on:** March 18, 2013, 08:51:39 am »

The context inside sf::RenderTexture is indeed not mandatory, it's just there for convenience.

Author Topic: Random spikes in time taken to draw objects (Read 10896 times)

Josh

Random spikes in time taken to draw objects

eXpl0it3r

AW: Random spikes in time taken to draw objects

Josh

Re: Random spikes in time taken to draw objects

eXpl0it3r

Re: Random spikes in time taken to draw objects

Josh

Re: Random spikes in time taken to draw objects

eXpl0it3r

Re: Random spikes in time taken to draw objects

Josh

Re: Random spikes in time taken to draw objects

LagEvent

Re: Random spikes in time taken to draw objects

Laurent

Re: Random spikes in time taken to draw objects

Josh

Re: Random spikes in time taken to draw objects

LagEvent

Re: Random spikes in time taken to draw objects

eXpl0it3r

Re: Random spikes in time taken to draw objects

LagEvent

Re: Random spikes in time taken to draw objects

LagEvent

Re: Random spikes in time taken to draw objects

Laurent

Re: Random spikes in time taken to draw objects