SFML community forums

General => Feature requests => Topic started by: binary1248 on December 04, 2015, 11:26:23 pm

Title: Get rid of the state cache inside RenderTarget
Post by: binary1248 on December 04, 2015, 11:26:23 pm
I never really liked the idea of this state cache. It makes interoperation between sfml-graphics and other libraries painful, and just complicates RenderTarget code.

I did some testing, and it seems that whether or not the cache is used, the performance stays identical, in both high and low graphics load situations. On first glance, it would seem like saving 1 matrix upload and having to manipulate a few more GL states every frame might have some form of impact. The fact that it doesn't, seems to me like an indication that we just move the work from the calls into GL to constantly having to pre-transform the vertex data on the CPU instead of leaving it to the shader units which are more suited for the job anyway.

Finally getting rid of the cache would make current maintenance of RenderTarget and future extension much easier. I'm already having a hard time figuring out how to support custom shader attributes with this cache around, and it also seems to have an effect on #1015 (https://github.com/SFML/SFML/pull/1015) as well.

For reference, this is the code I used to generate a synthetic load that would actually trigger the state caching:
#include <SFML/Graphics.hpp>
#include <algorithm>
#include <sstream>

int main()
{
    sf::RenderWindow window(sf::VideoMode(800, 600, 32), "Test");

    sf::Clock clock;
    sf::Event event;
    std::stringstream sstr;

    sf::Texture texture;
    texture.loadFromFile("resources/texture.jpg");
    sf::Sprite sprites[4096];

    for (int i = 0; i < 4096; i++)
        sprites[i].setTexture(texture);

    float deltas[256];
    int index = 0;

    while (window.isOpen())
    {
        deltas[index = ((index + 1) % 256)] = clock.restart().asSeconds();

        if (!index)
        {
            float avg = std::accumulate(deltas, deltas + 256, 0.f) / 256.f;
            sstr.str("");
            sstr << "Frame Time: " << avg << "s -- " << (1.f / avg) << " FPS";
            window.setTitle(sstr.str());
        }

        while (window.pollEvent(event))
        {
            if (event.type == sf::Event::Closed)
                return EXIT_SUCCESS;
        }

        window.clear();

        for (int i = 0; i < 4096; i++)
            window.draw(sprites[i]);

        window.display();
    }
}
You can test the performance with and without caching enabled by temporarily hardcoding this line (https://github.com/SFML/SFML/blob/master/src/SFML/Graphics/RenderTarget.cpp#L224) to false.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Laurent on December 05, 2015, 12:59:07 pm
Are you talking about the vertex cache only, or everything (view, blend mode, texture, shader)?
Title: Re: Get rid of the state cache inside RenderTarget
Post by: binary1248 on December 05, 2015, 02:59:45 pm
Everything. Most of the "time saved" (not really) is supposed to come from the vertex cache, so I mentioned disabling that. The same applies if you disable everything else as well. For me the performance stays exactly identical (in this synthetic load situation) whether they are on or off.

What you also have to consider is that when people decide to optimize themselves by e.g. using VertexArrays, the vertex cache checks would no longer need to be performed, which means a potential improvement for them. Right now, this vertex cache "optimization" is mostly targeted towards those who render primarily using quads. I think it is safe to say that more advanced users know that there is a lot of potential if they batch themselves into VertexArrays, so this optimization doesn't even affect the right audience if you ask me.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: hobby on December 05, 2015, 07:50:58 pm
Quote
Quote
Are you talking about the vertex cache only, or everything (view, blend mode, texture, shader)?
Everything.
Do you suggest to bind the texture every draw call (i.e. comment out this line (https://github.com/SFML/SFML/blob/master/src/SFML/Graphics/RenderTarget.cpp#L255))?
The recommendation to minimize state changes can be found not only in official guides of OpenGL 1.1 (which SFML fortunately still supports) but also on more contemporary ones (Apple (https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/OpenGLESApplicationDesign/OpenGLESApplicationDesign.html), ARM (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0555c/CHDCCCEG.html)).

Quote
I think it is safe to say that more advanced users know that there is a lot of potential if they batch themselves into VertexArrays, so this optimization doesn't even affect the right audience if you ask me.
Due to various considerations (e.g. undocumented limit for # of vertices, application data structures that need to be contiguous in memory, etc.), even when batching is used there may be multiple draw() calls sharing the same state. And even if not completely, perhaps partially.

Quote
Finally getting rid of the cache would make current maintenance of RenderTarget and future extension much easier. I'm already having a hard time figuring out how to support custom shader attributes with this cache around, and it also seems to have an effect on #1015 (https://github.com/SFML/SFML/pull/1015) as well.
As a user, I would prefer SFML to stay on the safe side with regard to best practices for performance, just like it does so well for C++. I'm not familiar with the new feature you're implementing or the overall maintenance efforts of the Graphics library, but I can say that I found RenderTarget quite elegant in its design when I implemented #1015.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: binary1248 on December 05, 2015, 10:18:10 pm
Do you suggest to bind the texture every draw call (i.e. comment out this line (https://github.com/SFML/SFML/blob/master/src/SFML/Graphics/RenderTarget.cpp#L255))?
The recommendation to minimize state changes can be found not only in official guides of OpenGL 1.1 (which SFML fortunately still supports) but also on more contemporary ones (Apple (https://developer.apple.com/library/ios/documentation/3DDrawing/Conceptual/OpenGLES_ProgrammingGuide/OpenGLESApplicationDesign/OpenGLESApplicationDesign.html), ARM (http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.dui0555c/CHDCCCEG.html)).
I know that changing state often is a bad thing, but what I also know is that changing client state has much less of an impact than changing server state. Anything that has to do with client-side arrays is purely client side i.e. is performed in the driver. Whether we choose to do a bit of caching ourselves or leave it up to the driver process is just a matter of taste. In the end, it is executed by the same CPU anyway, which is why there is no difference. Moving this responsibility over to the driver will let it optimize much better if it sees any opportunities. Not only that, but moving it out of RenderTarget will also make room for newer RenderTarget implementations (using newer OpenGL), which will result in much better performance anyway. There is no point talking about optimizing that last 1% when we know that OpenGL 1.1 is far from the optimal way of doing things anyway.

Due to various considerations (e.g. undocumented limit for # of vertices, application data structures that need to be contiguous in memory, etc.), even when batching is used there may be multiple draw() calls sharing the same state. And even if not completely, perhaps partially.
I'm not saying that it is always possible to draw your whole scene in a single draw call, but the people who count on the current optimization are literally doing nothing themselves to make the situation any better. Cutting down the number of batches will have a greater (and more noticeable) impact than having to enable and disable a few states more than necessary. You should normally start to optimize where it has a greater impact, and not where you don't have to think about it because it is taken care of by someone else.

If you want to see how far batching can go, you can go look at the SFGUI renderers. If you look at the SFGUI test application, there are well over a thousand primitives that get rendered every frame, and that using 3 draw calls. People who say batching isn't possible at all in their application without even trying either have no idea what a batch is, or are just too lazy to optimize performance themselves.

As a user, I would prefer SFML to stay on the safe side with regard to best practices for performance, just like it does so well for C++. I'm not familiar with the new feature you're implementing or the overall maintenance efforts of the Graphics library, but I can say that I found RenderTarget quite elegant in its design when I implemented #1015.
Like I said above, this "best practice" is written more with newer OpenGL in mind, and also has more noticeable effects when applied to newer OpenGL. I'm not a friend of going to great depths just to follow "best practice" even though it doesn't really benefit us and even hampers our efforts to maintain/extend the implementation. Getting rid of the state cache is purely a removal, and I don't know about you, but to me removing code which does not do what it was intended to do is an improvement. Encapsulation even had to be broken in Texture because the state cache had to track which texture was last active, already a minus point.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Laurent on December 06, 2015, 01:32:44 pm
Feel free to test new things, but all I can say is that every single optimization written there has been tested and proved (at the time I did it) to improve performances a lot.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: hobby on December 06, 2015, 09:14:52 pm
If you want to see how far batching can go, you can go look at the SFGUI renderers. If you look at the SFGUI test application, there are well over a thousand primitives that get rendered every frame, and that using 3 draw calls. People who say batching isn't possible at all in their application without even trying either have no idea what a batch is, or are just too lazy to optimize performance themselves.
No doubt you've gone to great lengths to optimize rendering in SFGUI. I've noticed that you call OpenGL directly, while I draw exclusively using SFML.

If I used glDrawElements() like SFGUI, I wouldn't be able to use SFML for drawing. I believe that my use case is very similar to GUIs, so perhaps I can draw from your experience? (sorry if it's a bit off-topic.)

In my application, I render a large number of convex shapes (fill & outline, with batching). I was sure that glDrawElements() would not have any advantage over RenderTarget:draw() with TrianglesStrip since my shapes were mostly disconnected.
I understand that when drawing text, SFML's rendering does not produce rectangles with shared vertices (due to this (https://github.com/SFML/SFML/blob/master/src/SFML/Graphics/Text.cpp#L372,L373)). Is it not the case with GUIs in general, given the amount of text they draw? Do you find many of your vertices shared among primitives? Should I abandon RenderTarget::draw() in favor of glDrawElements()?

One more question (and this is more related to RenderTarget state cache), aren't many of the primitives that SFGUI draws opaque (backgrounds, borders, etc.)?
Title: Re: Get rid of the state cache inside RenderTarget
Post by: binary1248 on December 07, 2015, 02:09:36 am
I was sure that glDrawElements() would not have any advantage over RenderTarget:draw() with TrianglesStrip since my shapes were mostly disconnected.
This is not a reason not to use a single triangle strip. Degenerate triangles let you jump from primitive to primitive even if they are disconnected without producing any fragments.

I understand that when drawing text, SFML's rendering does not produce rectangles with shared vertices (due to this (https://github.com/SFML/SFML/blob/master/src/SFML/Graphics/Text.cpp#L372,L373)). Is it not the case with GUIs in general, given the amount of text they draw? Do you find many of your vertices shared among primitives? Should I abandon RenderTarget::draw() in favor of glDrawElements()?
There is no point trying to share vertex positions if at least one attribute (in this case the texture coordinate) is going to change each time. Even if you use indexed rendering, you will still have a bit of duplication in one of your streams. SFML keeps it simple and doesn't even try to share.

One more question (and this is more related to RenderTarget state cache), aren't many of the primitives that SFGUI draws opaque (backgrounds, borders, etc.)?
Text is always non-opaque, due to the anti-aliased glyph backgrounds having to be transparent, the same as in SFML (from which the glyph texture data comes from). The rest of the primitives are mostly opaque by default, because that just happens to be the default theme. If SFGUI had to draw non-opaque widgets, it would be able to as well, using the same rendering methods. You can see this by switching the theme in SFGUI's test application.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: hobby on December 07, 2015, 06:32:05 am
If it wasn't clear, I was referring to SFGUI's using glDrawElements() with GL_TRIANGLE_STRIP as opposed to glDrawArrays() with GL_TRIANGLE_STRIP.

I use the latter (indirectly using RenderTarget::draw()), and of course, already make use of degenerate triangles (otherwise I couldn't batch anything). I was wondering if you gained any benefit from using glDrawElements() in SFGUI, as it might have some overhead (CPU/GPU) due to the need to maintain vertex indices. Why not use glDrawArrays() instead?
Title: Re: Get rid of the state cache inside RenderTarget
Post by: binary1248 on December 09, 2015, 12:56:42 am
There is a slight difference between glDrawElements() and glDrawRangeElements(). The latter gives the driver/GPU more opportunities to optimize things, which is why SFGUI uses it. There is no glDrawRangeArrays() (and it wouldn't make much sense anyway) so...
Title: Re: Get rid of the state cache inside RenderTarget
Post by: hobby on December 09, 2015, 12:50:05 pm
I see. I was looking only at VertexArrayRenderer (for 1.1 support) so I hadn't noticed SFGUI's usage of glDrawRangeElements.

Anyway, thanks to this discussion with you now I understand my options for optimization better. It seems that even the simpler glDrawElements could provide some benefit in my case if I were ready to do my own rendering. For example, I could cut the number of vertices for drawing text by 1/3. It seems that even SFGUI does not bother to do that, so surely it's less relevant to someone trying to use only SFML for drawing. I'm sticking to RenderTarget.draw(VertexArray) :).
Title: Re: Get rid of the state cache inside RenderTarget
Post by: eXpl0it3r on December 10, 2015, 08:37:50 am
I'll still have to test the code and see whether it makes a difference, but personally I think it makes sense to remove the optimization in favor of cleaner and better maintainable code, even if there are some scenarios where it might run worse after the removal.

As for future optimization, I think it would be beneficial to document the reasoning behind them somewhere, so when we get back to it years later, it still is known why it was added. ;D
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Laurent on December 10, 2015, 09:17:10 am
Quote
I'll still have to test the code and see whether it makes a difference, but personally I think it makes sense to remove the optimization in favor of cleaner and better maintainable code, even if there are some scenarios where it might run worse after the removal.
I don't think we can decide to do it (or not) before a significant part of the SFML users have tested and validated it. Especially users with low-end hardware.

Quote
As for future optimization, I think it would be beneficial to document the reasoning behind them somewhere, so when we get back to it years later, it still is known why it was added.
The reasoning is simple here, it is the same as usual: less OpenGL calls yields better performances.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: binary1248 on December 10, 2015, 12:36:02 pm
I don't think we can decide to do it (or not) before a significant part of the SFML users have tested and validated it. Especially users with low-end hardware.
This is always the case. Nothing different here.

The reasoning is simple here, it is the same as usual: less OpenGL calls yields better performances.
This is like saying: Less standard library calls yields better performance.

I don't think this is the case if we end up implementing the algorithms ourselves instead. ;)

The same can be said about OpenGL. We don't know whether we are doing exactly the same work as the driver would have done. As such, I don't think it is as simple as "minimize the number of calls". That is just a hint at where to look for bottlenecks... The only way to get true performance data is to do the measurements ourselves, and these measurements tell me that this optimization doesn't end up behaving like one.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Laurent on December 10, 2015, 01:47:10 pm
We could start arguing endlessly, but I think this topic will be much more useful if we focus on the important facts:
- the current code is the result of precise and intensive tests, not random code that I wrote in the hope that it might improve performances
- I'll be the first one to be happy about a total removal of the cache stuff, so go ahead, but don't draw conclusions to quickly, make as many users as possible to test the modifications (not just a few (or even none) as usual)
Title: Re: Get rid of the state cache inside RenderTarget
Post by: eXpl0it3r on December 17, 2015, 07:58:17 pm
SFML master: ~1100 FPS
useVertexCache = false: ~450 FPS

Anything else to test?
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Tank on December 18, 2015, 08:59:45 am
With caching: ~57 FPS
Without caching: ~59 FPS

I have *no* idea why it's so damn slow in both cases, but at least there's a slight effect.

02:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)
Title: Re: Get rid of the state cache inside RenderTarget
Post by: SpeCter on December 18, 2015, 11:07:27 am
With caching: ~57 FPS
Without caching: ~59 FPS

I have *no* idea why it's so damn slow in both cases, but at least there's a slight effect.

02:00.0 VGA compatible controller: NVIDIA Corporation GK104 [GeForce GTX 660 Ti] (rev a1)

Driver enabled vsync maybe?
I think in this case the 2 fps difference is more likely just noise and not something which comes from caching/non-caching.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Tank on December 18, 2015, 01:40:52 pm
Quote
Driver enabled vsync maybe?
Nope, it's not vsync. With 1024 sprites I get ~147.

Quote
I think in this case the 2 fps difference is more likely just noise and not something which comes from caching/non-caching.
I don't think so, as I've watched the numbers for like a minute. It's reproducible as well.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Nexus on December 20, 2015, 03:31:34 pm
SFML master: ~186 FPS
Disabled vertex cache: ~75 FPS

Intel HD Graphics 3000
(I had to use a small image or the framerate was abysmal)

By the way, the code needs a #include <numeric> for std::accumulate(). It should allocate the sprites dynamically; I got a stack overflow.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Mario on December 21, 2015, 05:44:26 pm
Which texture are you guys using? These values don't sound right to me. I've used the OpenGL example's "texture.png":

Unmodified master branch: ~81 FPS
Vertex cache disabled: ~92 FPS

Geforce GTX 680 in a x64 release build of SFML.

Updated code:
(click to show/hide)
Title: Re: Get rid of the state cache inside RenderTarget
Post by: Tank on December 21, 2015, 09:12:44 pm
Mario: Used the same.
Title: Re: Get rid of the state cache inside RenderTarget
Post by: hobby on December 24, 2015, 07:21:48 pm
Updated code:
...

GeForce GT610 2GB.
Using above code on master of 3 weeks ago, OpenGL example texture.jpg, x32 Release:
- With & without vertex cache ~ 4.6 FPS

I suspected that there's no difference due to fillrate limitation vs. low # of OpenGL calls, so I reduced texture size to 64x64:
sprites[i].setTextureRect(sf::IntRect(0, 0, 256/4, 256/4));

Results:
- With vertex cache - 140 FPS
- Without vertex cache - 83.5

Those of you that got negligible difference, could you re-run your tests with smaller texture sizes?