Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: Just wondering (performance)  (Read 4749 times)

0 Members and 5 Guests are viewing this topic.

danikaze

  • Newbie
  • *
  • Posts: 38
    • View Profile
    • Email
Just wondering (performance)
« on: April 03, 2013, 03:28:50 pm »
Even if there are several classes to draw tile maps, I wanted to test 3 options and see the difference of performance:
  • filling a map using Vertex
  • pregenerate an image and use it as chunk
  • use vertexArray

As I thought, just using vertex was going to be slow, so I thought about pre-rendering bigger chunks and using them to save draw() calls.

What was my surprise when I saw the test results (debug mode)...

Filling a 800x600 window, with 32x32 tiles (and a simple chunk of 800x600)
  • filling a map using Vertex (475 calls): 4706 ms
  • pregenerated image and used as chunk (1 call): 4442ms
  • vertexArray (1 call to draw): 57 ms

Why the big difference between using only 1 800x600 image and calling once to draw but using 1900 vertices?
My common sense says that [1 draw / 800x600px / 4 vertices] should be faster than [1 draw / 800x600px / 1900 vertices]

In release mode the results are quite different...
  • filling a map using Vertex (475 calls): 544 ms
  • pregenerated image and used as chunk (1 call): 7ms
  • vertexArray (1 call to draw): 5399 ms

But again, I don't know why the vertexArray (1 call / 1900 vertices) is much slower than the Vertex call (475 calls / 1900 vertices)

Here is the code of the test, if you wanna try...
        sf::Clock clock;

        // prepare vertices[4]
        sf::Texture* tilesTexture = ResourceManager::getTexture("tiles.png");
        sf::Vertex vertices1[4];
        vertices1[0].position = sf::Vector2f(0, 0);
        vertices1[1].position = sf::Vector2f(31, 0);
        vertices1[2].position = sf::Vector2f(31, 31);
        vertices1[3].position = sf::Vector2f(0, 31);

        vertices1[0].texCoords = sf::Vector2f(0, 0);
        vertices1[1].texCoords = sf::Vector2f(31, 0);
        vertices1[2].texCoords = sf::Vector2f(31, 31);
        vertices1[3].texCoords = sf::Vector2f(0, 31);

        sf::RenderStates VertexStates;
        VertexStates.texture = tilesTexture;

        // test vertices[4]
        clock.restart();
        for(int x=0; x<25; x++)
        {
                for(int y=0; y<19; y++)
                {
                        vertices1[0].position = sf::Vector2f(32*x, 32*y);
                        vertices1[1].position = sf::Vector2f(32*(x+1), 32*y);
                        vertices1[2].position = sf::Vector2f(32*(x+1), 32*(y+1));
                        vertices1[3].position = sf::Vector2f(32*x, 32*(y+1));
                        window.draw(vertices1, 4, sf::Quads, VertexStates);
                }
        }
        cout << "vertex[4]: " << clock.getElapsedTime().asMicroseconds() << endl;

        // prepare 1 draw with a 800x600 image
        sf::Texture* fondo = ResourceManager::getTexture("bg800x600.png");
        vertices1[0].position = sf::Vector2f(0, 0);
        vertices1[1].position = sf::Vector2f(799, 0);
        vertices1[2].position = sf::Vector2f(799, 599);
        vertices1[3].position = sf::Vector2f(0, 599);

        vertices1[0].texCoords = sf::Vector2f(0, 0);
        vertices1[1].texCoords = sf::Vector2f(799, 0);
        vertices1[2].texCoords = sf::Vector2f(799, 599);
        vertices1[3].texCoords = sf::Vector2f(0, 599);

        // test the image chunk
        clock.restart();
        VertexStates.texture = fondo;
        window.draw(vertices1, 4, sf::Quads, VertexStates);
        cout << "chunk800x600: " << clock.getElapsedTime().asMicroseconds() << endl;

        // prepare the VertexArray call
        VertexStates.texture = tilesTexture;
        sf::VertexArray vertices2(sf::Quads, 25*19*4);
        for(int x=0; x<25; x++)
        {
                for(int y=0; y<19; y++)
                {
                        vertices2[(x + y*25)*4].position = sf::Vector2f(32*x, 32*y);
                        vertices2[(x + y*25)*4+1].position = sf::Vector2f(32*(x+1), 32*y);
                        vertices2[(x + y*25)*4+2].position = sf::Vector2f(32*(x+1), 32*(y+1));
                        vertices2[(x + y*25)*4+3].position = sf::Vector2f(32*x, 32*(y+1));

                        int i = rand()%9;
                        int j = rand()%9;
                        vertices2[(x + y*25)*4].texCoords = sf::Vector2f(i*32, 32*j);
                        vertices2[(x + y*25)*4+1].texCoords = sf::Vector2f(32*(i+1), 32*j);
                        vertices2[(x + y*25)*4+2].texCoords = sf::Vector2f(32*(i+1), 32*(j+1));
                        vertices2[(x + y*25)*4+3].texCoords = sf::Vector2f(32*i, 32*(j+1));
                }
        }

        // test the VertexArray call
        clock.restart();
        window.draw(vertices2, VertexStates);
        cout << "vertexArray: " << clock.getElapsedTime().asMicroseconds() << endl;

 

BTW bg800x600.png is a 800x600 image, and tiles.png a 256x256 image.

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32504
    • View Profile
    • SFML's website
    • Email
Re: Just wondering (performance)
« Reply #1 on: April 03, 2013, 03:49:29 pm »
Your tests are not fair.

First, you should run each solution separately for a few seconds and measure the average framerate, rather than doing a one-time execution which may be disturbed by many external factors (like the order of the tests).
Doing single measurements rather than testing the whole application is also meaningless because OpenGL is an asynchronous API: the graphics driver may enqueue commands and send them to the graphics card later (most likely in window.display()).

Secondly, you measure both the init time and the run time of solutions; it's obvious that solutions that are faster to draw require a longer initialization time, because you must prepare, pre-allocate, or pre-render stuff. This time shouldn't be counted, only the draw time should be measured.

And lastly, don't look at results in debug mode. There will be huge differences between solutions that use mostly the CPU, versus solutions that use mostly the GPU. It's irrelevant.
« Last Edit: April 03, 2013, 03:59:40 pm by Laurent »
Laurent Gomila - SFML developer

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
Re: Just wondering (performance)
« Reply #2 on: April 03, 2013, 03:50:15 pm »
What was my surprise when I saw the test results (debug mode)...
All the conclusions in the following part of your post are meaningless.

Simple rule: One does not measure in debug mode. It doesn't represent reality, and it leads to distortions because of disabled optimizations and additional error checking. Debug mode is to find bugs, not to optimize code.
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

danikaze

  • Newbie
  • *
  • Posts: 38
    • View Profile
    • Email
Re: Just wondering (performance)
« Reply #3 on: April 03, 2013, 04:39:33 pm »
That's the thing. Even if I run the test in debug mode, I also run it on release mode, and the results are quite more suprising because the solution suposed to be faster (required a longer initialization time) isn't faster...

Remember: results of the release mode:
- filling a map using Vertex (475 calls): 544 ms
- pregenerated image and used as chunk (1 call): 7ms
- vertexArray (1 call to draw): 5399 ms

Why is the vertexArray the slowest one?

OK. I agree in this:
Quote
First, you should run each solution separately for a few seconds and measure the average framerate, rather than doing a one-time execution which may be disturbed by many external factors (like the order of the tests).
Doing single measurements rather than testing the whole application is also meaningless because OpenGL is an asynchronous API: the graphics driver may enqueue commands and send them to the graphics card later (most likely in window.display()).

That's why I repeated the test with 1000 iterations and several seconds between them... here are the new results (release mode)
- 100x filling a map using Vertex (475 calls): 292248 ms
- 100x pregenerated image and used as chunk (1 call): 605ms
- 100x vertexArray (1 call to draw): 71439 ms

Much better, I think...
Anyways, still not a real test, because lots of different things are performed between 2 map drawings (such as changing the texture to draw objects, characters, etc.), but well, it's something.

So, as I thought, the best way to draw a tile map is, indeed, precaching it into bigger chunks and drawing less vertices.

Oh, it's true that faster solutions requires more initialization time, but that's one thing made "loading the level" not "in the main game", so I was counting on it and that was one of my purposes.

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
Re: Just wondering (performance)
« Reply #4 on: April 03, 2013, 05:21:12 pm »
Did you also disable the debugger? In Visual studio, that's independent from Release/Debug mode. By pressing F5, you start with the debugger, using Ctrl+F5 you start without. And if you want a fair comparison, show the new code and apply the points mentioned by Laurent.

Anyway, it makes sense that drawing 4 vertices (a texture) is faster than a whole vertex array... There are however other criterions to keep in mind, like animated tiles or texture size limits.
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

danikaze

  • Newbie
  • *
  • Posts: 38
    • View Profile
    • Email
Re: Just wondering (performance)
« Reply #5 on: April 03, 2013, 05:34:04 pm »
Did you also disable the debugger? In Visual studio, that's independent from Release/Debug mode. By pressing F5, you start with the debugger, using Ctrl+F5 you start without. And if you want a fair comparison, show the new code and apply the points mentioned by Laurent.

Anyway, it makes sense that drawing 4 vertices (a texture) is faster than a whole vertex array... There are however other criterions to keep in mind, like animated tiles or texture size limits.
Yes, it's without the debugger. Anyways, running with F5 or with Ctrl+F5 doesn't affect much to this test.

About the new code, I just added a "wait" before each test
while(clock.getElapsedTime() < sf::seconds(5));
 

and wrapped each test with
for(int t=0; t<1000; t++)
 

About animated tiles, you have a point there, but I guess if there are only a few animated tiles they can be represented by separated AnimatedSprites over the floor tiles. Easy to do and probably still faster. Or at least that's what I think :P

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
Re: Just wondering (performance)
« Reply #6 on: April 03, 2013, 05:53:29 pm »
Yes, it totally depends on the use case. If you don't have too many tiles, you shouldn't even worry about complicated optimization techniques.

For example, my Jump'n'Run Zloxx II uses a sf::Sprite for each tile (it was developed before sf::VertexArray existed) -- and it doesn't store the sprite, it recreates it every frame. There are no performance problems.
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

krzat

  • Full Member
  • ***
  • Posts: 107
    • View Profile
Re: Just wondering (performance)
« Reply #7 on: April 03, 2013, 07:54:15 pm »
So, as I thought, the best way to draw a tile map is, indeed, precaching it into bigger chunks and drawing less vertices.

Not if your tilemap is very big, or when it changes dynamically. The most universal solution would be to batch your calls every frame (XNA style).
SFML.Utils - useful extensions for SFML.Net

danikaze

  • Newbie
  • *
  • Posts: 38
    • View Profile
    • Email
Re: Just wondering (performance)
« Reply #8 on: April 04, 2013, 05:55:14 am »
The most universal solution would be to batch your calls every frame (XNA style).
So, I guess, a VertexArray is the prepared batch...

krzat

  • Full Member
  • ***
  • Posts: 107
    • View Profile
Re: Just wondering (performance)
« Reply #9 on: April 04, 2013, 09:52:23 am »
Only if you are rebuilding it every frame from currently visible tiles.
SFML.Utils - useful extensions for SFML.Net

danikaze

  • Newbie
  • *
  • Posts: 38
    • View Profile
    • Email
Re: Just wondering (performance)
« Reply #10 on: April 04, 2013, 11:51:43 am »
Only if you are rebuilding it every frame from currently visible tiles.
uhm, that's true.
sf::View doesn't manages visible/not visible images to optimize? If not, any better proposal? Maybe having several VertexArrays as chunks and only display those visible ones...