Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: [SOLVED] Speeding Up Rendering Performance  (Read 7538 times)

0 Members and 2 Guests are viewing this topic.

Chaia*

  • Newbie
  • *
  • Posts: 21
    • View Profile
[SOLVED] Speeding Up Rendering Performance
« on: November 08, 2016, 08:32:53 pm »
Hi, I'm working every then and now with SFML since a couple of years, so I'd say I'm an experienced user.

I'm working on a strategy game and currently I'm improving graphics and graphics performance. Most of my map is covered by grass which is animated by vertex shaders (see attached screenshot to get an idea about looks and perspective etc.; looks a bit crappy on the screenshot somehow :().

This grass is simply implemented by untextured triangles in vertex arrays. To counteract the resulting horrible aliasing I'm downsampling from a 4k rendertexture in which I draw everything.

Of couse there are NPCs and other things moving in my game, so I can't draw all grass triangles in one vertex array --> there needs to be a specific order of things being drawn. Therefore I divided my grass into vertex arrays covering each 1024x32 pixels, collect everything (grass vertex arrays, NPCs, etc.) in a vector, sort it and draw it.

Unsurprisingly when zooming out I sooner or later run into performance problems.
To be specific, here are some measurements:
- 2000 Drawcalls (mostly from grass vertex arrays)
- 300k Triangles (each representing one grass blade)
-> resulting in 50-60 FPS

I want to achieve about 100 FPS under these conditions without loosing too much graphics quality.

To get a deeper insight I did some more measurements:
- I increased grass vertex array "covering-size" to 1024x64, therefore reducing number of draw calls. This results only in slightly increased performance: 1300 Drawcalls, 330k Triangles -> 65 FPS -> I'm likely not Draw call limited
- I decreased the area covered by a single triangle while keeping the number of triangles constant. Results in zoomed out conditions in same FPS: each 1000 Drawcalls, 275k Trianles, 75 FPS. --> likely not Fillrate limited
- looked at CPU and GPU utilization: 12%; (about 1 1/2 core), GPU jumps between 99% and 0%, most of the time in the range 0-50% !without the scene changing, really strange!. Additionally, both GPU and CPU do not boost to the max.

This led me to the conclusion that I may be bandwidth limited? I'm not sure about SFML's internal implementation of vertex arrays but are they sent from CPU to GPU every frame?

Therefore I have a couple of questions about possibilities to improve performance:
- if vertex arrays are sent every frame to the GPU, is there a possibilty for me to change SFML to keep vertex arrays in GPU memory (without needing to crash SFML's whole design)?
- would you try to use geometry shaders to ideally cut bandwidth by 2/3s by generating the triangles on the gpu?
- are there other possibilities to improve performance of SFML internally for me (without the need to break the design or recode everything)?
- can you suggest profiling programs etc. to find the bottleneck?
- Do you have other suggestions (maybe a complete different design?) to achive the same with better performance? Does OpenGL offer some stuff I could use without too much effort implementing it? (I have to say I'm not really experienced with pure OpenGL)

I think there is a huge bottleneck somewhere because I thing my PC should be able to manage minimum 5k Drawcalls and a million Triangles with the same performance:
i7 5820k
16 GB RAM
R9 290

Thanks in advance and kind regards
Chaia*
« Last Edit: November 15, 2016, 01:47:29 pm by Chaia* »

Hapax

  • Hero Member
  • *****
  • Posts: 3379
  • My number of posts is shown in hexadecimal.
    • View Profile
    • Links
Re: Speeding Up Rendering Performance
« Reply #1 on: November 09, 2016, 02:49:05 am »
Why are there 2000 draw calls from grass vertex arrays? If you have strips of 1024x32, and your target render is 4096x4096 (I assume it's actually smaller than this though), that's 512 strips and therefore 512 draw calls.
Furthermore, why are they covering a width of 1024 instead of the entire 4096? Draw calls then drop to 128 (for 4096x32 sized strips).

It could be possible to combine everything into just one vertex array. The vertex array can include the other objects with the grass triangles by just ordering them correctly. Of course, all of the objects would need to share a texture. You may need to give the shader information about which triangles to actually transform; you wouldn't want NPCs etc. to act like grass!

I am no expect on optimisation but these questions/clarifications stood out and might help others to understand the problem they are helping you with :)
Selba Ward -SFML drawables
Cheese Map -Drawable Layered Tile Map
Kairos -Timing Library
Grambol
 *Hapaxia Links*

Chaia*

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Speeding Up Rendering Performance
« Reply #2 on: November 09, 2016, 12:24:38 pm »
Okay, I may have explained it a bit unclearly.
So my RenderTexture which I use as "BackBuffer" actually has the size 3840x2160 (2x 1920x1080). The actual RenderWindow has the size 1920x1080. My sf::View has a default size of 1920x1080. So basically I just create a RenderTexture with double the amount of pixels, draw everything onto this RenderTexture and then draw the RenderTexture onto the actual RenderWindow while downsampling to 1920x1080 in a fragment shader to achieve some kind of supersampling.

Therefore with default view size I have roughly 2x34=68 GrassVertexArrays drawn. But when I zoom out, the sf::View's size increases and with maximum zoom level i get about 2000 drawcalls. The GrassVertexArrays only have a size.X of 1024 because of culling reasons. My entire Map is divided into "Chunks" each covering 1024x1024 Pixels, which allows me to efficiently cull out Chunks and the contained GrassVertexArrays when they are not seen.

I already considered your suggestion about building one vertexarray which contains everything in the correct order but I have the feeling that this approach would need more CPU-time, when I have multiple moving objects in the scene, than would be saved by reducing the amount of draw calls.

Edit: I started to read into some pure OpenGL and thought maybe VertexBufferObjects could be a solution for me? Optimally combined with VertexArrayObjects to save API calls. But as mentioned before I'm not really experienced in pure OpenGL etc., so what do you guys think about this approach? Will it work together with sfml without too much effort?

Here is some pseudocode to clarify:
create Renderwindow (size=Fullscreenresolution (in my case 1920x1080))

create RenderTexture (size=2xFullscreenresolution)

gameloop
{
        // here I just calculate the chunks currently visible and store them in a std::vector
        BuildVisibleChunkVector()

        // Here I give each chunk a reference of a vector storing a struct containing sf::VertexArray*, and              
        // an int for sorting
        // Each chunk then checks which GrassVertexArrays are seen this frame and adds them to the std::vector
        // later, NPCs and other objects will be added to this vector too
        for (every visible Chunk)
              AddVisibleGrassVertexArrayToGlobalVector()

        // here I just perform a simple std::sort on the vector to get everything into the correct order
        SortVector()

        ClearRenderTexture()
       
        for (every GrassVertexArray in the Vector)
              RenderTexture.Draw(GrassVertexArray)

       RenderTexture.display()

       RenderWindow.clear()
 
       sf::Sprite Spr;
       Spr.setTexture(RenderTexture.getTexture())

       sf::RenderStates st
       st.Shader = downsampleShader      

       RenderWindow.draw(Spr, st)

       RenderWindow.display()

       // Here, the game is being updated, e.g. camera moved, zooming etc.
       UpdateGame()
}
 
« Last Edit: November 09, 2016, 12:29:16 pm by Chaia* »

Hapax

  • Hero Member
  • *****
  • Posts: 3379
  • My number of posts is shown in hexadecimal.
    • View Profile
    • Links
Re: Speeding Up Rendering Performance
« Reply #3 on: November 09, 2016, 01:50:53 pm »
My first impression was that you were rendering to an UHD "4K" resolution texture and then displaying that texture to a Full HD window, which you have now confirmed. :)

As mentioned above, some cards cannot use textures of 4k width so would not be able to deal with this render texture. You may consider these cards to be just "too old" for your game so make sure you mention minimum requirements when you release it ;)

It may be a better (more optimal) option to create vertex arrays after you decide what is going to be displayed.  Using a draw call on a small area because that vertex array is far away seems wasteful. Chunks could be used to decide if that area (containing grass/objects) is visible and then added to the vertex array (whether that be the entire scene or the horizontal strip).

Dynamically creating a single vertex array to cover everything would increase CPU work, of course, but it can significantly reduce GPU load, including draw calls and switching textures. The increased CPU workload is likely to be much less noticable and therefore the trade-off could be well worth it. The thing to consider would be where your bottleneck is. If it's graphics, off-load GPU work to the CPU. If it's logic, off-load CPU work to the GPU. Of course, it's most likely that graphics becomes the bottleneck in such intensely populated scenes, as is your situation.

There are features in direct OpenGL - especially in newer versions of OpenGL - that can help with optimisations, sure, but working with OpenGL directly can be less simple than using SFML and often not necessary. Mixing OpenGL and SFML is possible but it has its limitations and there have been numerous thread on this forum about problems people have run into trying to do just that. One thing to note is that SFML 'plays more nicely' with older versions of OpenGL.
Selba Ward -SFML drawables
Cheese Map -Drawable Layered Tile Map
Kairos -Timing Library
Grambol
 *Hapaxia Links*

Chaia*

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Speeding Up Rendering Performance
« Reply #4 on: November 11, 2016, 12:18:29 am »
Thanks for your answer :)

In my game I have the option to easily switch to default resolution instead of UHD, so it should not be a big problem to run it on older machines.

The next days I will try two things, first to dynamically build the GrassVertexArrays each stripe every frame and second to use VBOs to reduce the needed bandwidth and then report my results.

Thanks for your efforts so far!

Chaia*

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Speeding Up Rendering Performance
« Reply #5 on: November 14, 2016, 10:20:02 pm »
Hey, I'm back with nice results :D
In a small test application I compared the two options (one large vertex array and many VBO's and VAO's) and I figured out the solution with many VBO's and VAO's was about 25% faster than the large vertex array. However, the CPU utilization went down to around 1-2% when using VBO's + VAO's.

Therefore I implemented the solution with VBO's and VAO's in my game. Unexpectedly my performance increased way more than I thought and now I reach 250 FPS with 1300 Drawcalls and 400k Grass-Triangles. Before I had 65 FPS with 330k Grass-Triangles. So I improved by a factor of 3,8! I think a big part of the performance increase comes from the fact that I do not have to bind my vertex shader for the grass animation for every VBO/VAO again - unlike sfml which does not cache shader binding. Another chunk of performance increase is caused by my CPU which is now able to actually do something, while the GPU is rendering, instead of constantly shuffling vertices from RAM to VRAM like it was before.

Hapax

  • Hero Member
  • *****
  • Posts: 3379
  • My number of posts is shown in hexadecimal.
    • View Profile
    • Links
Re: Speeding Up Rendering Performance
« Reply #6 on: November 15, 2016, 12:49:13 am »
Glad you managed to get the results you were hoping for! :)

"Going behind SFML's back" can help in more specific scenarios with better performance, especially if there is a massive amount of heavy graphics work or a requirement for newer OpenGL features.
That said, though, SFML - and its vertex arrays - is a lot simpler to work with than OpenGL directly. I, personally, have very little OpenGL knowledge and would have quickly got bored of research to achieve that ;D

Did your test with one large vertex array result in 200FPS with one draw call of 400k grass-triangles?
Selba Ward -SFML drawables
Cheese Map -Drawable Layered Tile Map
Kairos -Timing Library
Grambol
 *Hapaxia Links*

Chaia*

  • Newbie
  • *
  • Posts: 21
    • View Profile
Re: Speeding Up Rendering Performance
« Reply #7 on: November 15, 2016, 12:49:53 pm »
In my test application I got around 130 FPS with one single sf::vertexarray and 400k Triangles. Using 1000 VBOs/VAOs and the same number of triangles was around 25% faster. I can't tell you why I reach much better results in my actual game now. I think it's because I cache the shader in my game while I didn't in the test application. Therefore with all optimizations the VAO/VBO solution is around 90% faster but the main gain comes from caching the shader binding.

Generally, I think the problem with sf::vertexarray is the shuffling of vertexdata from RAM to VRAM. Additionally the framerate with sf::vertexarrays is much more unstable, I think whenever there is a slight more load on the CPU, the framerate decreases.

Edit: Another advantage of using VAOs/VBOs is I can easily implement a dynmaic and smooth Level of Detail system for when the user zooms further out. With gl_DrawArrays i can simply tell the graphics card to only draw triangles 0 to 100, or 0 to end or 0 to any value :)

At first I thought using VBO's/VAO's would be a complex task to implement but it turned out much easier.
Here is the relevant code in case anybody might want to use it:


(click to show/hide)

Basically you also can leave VAO's out and only use VBO's which are already provided by OGL 1.5 I think (?). VAO's are OGL 3.0(?). But then you need to do a few more calls when rendering which you can shuffle to the initialization part when using VAO's. In my tests I got the same framerate with VBO's and VAOs/VBOs but with VAOs/VBOs the CPU utilization was about 1-2% lower (8,3% equals one fully utilized thread on my machine). 

I have to admit I'm not quite sure about all the OGL basics either. Espacially the different matrices and blendmodes are a bit confusing to me but I did not really look into it. The code above works fine for me and according to different examples regarding VAOs/VBOs I think everything is correct. For my special case this is enough, therfore I will now switch back to game development and pure sfml instead of pure OGL :)

Thanks for your help. 
« Last Edit: November 15, 2016, 01:12:19 pm by Chaia* »