Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: A single draw call severely slows thing down  (Read 313 times)

0 Members and 1 Guest are viewing this topic.

nogoodname

  • Newbie
  • *
  • Posts: 13
    • View Profile
A single draw call severely slows thing down
« on: July 30, 2019, 10:46:38 pm »
I'm currently trying to make a simple bullet hell game, and I've noticed that a single draw call slows things down a lot if there's a lot of vertices, and I'm not sure how to fix it.

I've made this simple sample version, to illustrate the issue.
#include <SFML/Graphics.hpp>
#include <SFML/Window.hpp>
#include <chrono>
#include <iostream>
#include <memory>
static const sf::IntRect rect = {0,0,64,64};
static const sf::Vector2f initPos = {400,150};
class Bullet {
public:
    Bullet(sf::Vector2f vel_) : velocity{vel_} {
        vertices[0].texCoords = sf::Vector2f{0,0};
        vertices[1].texCoords = sf::Vector2f{64,0};
        vertices[2].texCoords = sf::Vector2f{64,64};
        vertices[3].texCoords = sf::Vector2f{0,64};
        for(size_t i=0; i<4; ++i)
            vertices[i].position = initPos + vertices[i].texCoords;
        vertices[4] = vertices[0];
        vertices[5] = vertices[2];
    }
    void update() {
        for(size_t i=0; i<6; ++i) {
            vertices[i].position += velocity;
        }
    }
    sf::Vertex* getVertices() {
        return vertices;
    }
private:
    sf::Vertex vertices[6];
    sf::Vector2f velocity;
};
int main() {
    constexpr size_t TOTAL_BULLETS = 20000;
    constexpr float pi = 4*std::atan(1);
    std::unique_ptr<Bullet> bullets[TOTAL_BULLETS];
    sf::VertexArray bulletVertices{sf::Triangles,TOTAL_BULLETS*6};
    sf::Texture tex;
    //tex.loadFromFile("bullets.png");
    for(size_t i=0; i<TOTAL_BULLETS; ++i) {
        float angle = 2*pi * static_cast<float>(i)/TOTAL_BULLETS;
        auto vel = sf::Vector2f{std::cos(angle),std::sin(angle)} * 1.0f;
        bullets[i] = std::make_unique<Bullet>(vel);
    }
    sf::RenderWindow window(sf::VideoMode(800,600), "Test");
    while( window.isOpen() ) {
        auto tp = std::chrono::steady_clock::now();
        sf::Event event;
        while( window.pollEvent(event) ) {
            switch(event.type) {
            case sf::Event::Closed:
                window.close();
            break;
            default:
            break;
            }
        }
        for(size_t j=0; j<TOTAL_BULLETS; ++j) {
            bullets[j]->update();
            auto vertice = bullets[j]->getVertices();
            for(size_t i=0; i<6; ++i)
                bulletVertices[j*6+i] = vertice[i];
        }
        window.clear();
        window.draw(bulletVertices,&tex); //This line slows things down a lot
        window.display();
        auto dur = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now()-tp);
        std::cout<<"Frame takes: "<< dur.count() <<"microseconds.\n";
    }
    return 0;
}
 

The main point of interest is line 64, which is:
window.draw(bulletVertices,&tex); //This line slows things down a lot
 
This draws 120,000 vertices, or 40,000 triangles. By commenting it out, I've noticed that the program becomes a lot faster. (When uncommented, the frame time is about 45,000 microseconds, and commenting makes it becomes about 5,000 microseconds.)
« Last Edit: August 01, 2019, 02:35:35 pm by nogoodname »

Hapax

  • Hero Member
  • *****
  • Posts: 2765
  • My number of posts is shown in hexadecimal.
    • View Profile
Re: A single draw call severely slows thing down
« Reply #1 on: July 31, 2019, 09:10:22 pm »
Drawing stuff will always be slower than not drawing stuff. It takes time.

A single draw call has to transfer all the data to the graphics card and also prepare the texture for use (if one is used) so 40k microseconds seems pretty quick for 40k triangles when you include the actual draw call setup when you consider it was taking 5k microseconds for normal calculations.

5k microseconds is 200 FPS whereas 45k microseconds is about 22 FPS.
200 seems quite slow for no drawing; is it really that computationally loaded?
22 does seem "slow" to be fair but it's a lot of triangles to be transferring every frame.

The solution for this many triangles would be to use a Vertex Buffer instead of a Vertex Array.
Selba Ward - SFML drawables
Kairos - Timing Library
Rectangular Boundary Collision - Rectangular SAT Collision

@Hapaxiation - Hapaxia on Twitter

nogoodname

  • Newbie
  • *
  • Posts: 13
    • View Profile
Re: A single draw call severely slows thing down
« Reply #2 on: August 01, 2019, 03:42:49 am »
(click to show/hide)

Thanks for replying.
So I've went and tried to use the vertex buffer, and I didn't noticed a noticeable increase in draw speed. The only thing I've noticed was a lower CPU usage compared to using regular vertex array when I was checking through process explorer, although GPU usage hasn't went noticeably down (it still remains at >80%).

Here's an edited version (only the main function, as everything else is the same as before). I've moved a bunch of stuff out of the while loop, the main points of interest are after where I've marked //HERE.
int main() {
    constexpr size_t TOTAL_BULLETS = 20000;
    sf::VertexArray bulletVertices{sf::Triangles,TOTAL_BULLETS*6};
    sf::RenderWindow window{sf::VideoMode{800,600}, "Test"};
    constexpr float pi = 4*std::atan(1);
    std::unique_ptr<Bullet> bullets[TOTAL_BULLETS];  

    for(size_t j=0; j<TOTAL_BULLETS; ++j) {
        float angle = 2*pi * static_cast<float>(j)/TOTAL_BULLETS;
        auto vel = sf::Vector2f{std::cos(angle),std::sin(angle)} * 1.0f;
        bullets[j] = std::make_unique<Bullet>(vel);
        auto vertice = bullets[j]->getVertices();
        for(size_t i=0; i<6; ++i)
            bulletVertices[j*6+i] = vertice[i];
    }

    //HERE
    sf::VertexBuffer bufferVertices{sf::Triangles};
    bufferVertices.create(bulletVertices.getVertexCount() );
    bufferVertices.update(&bulletVertices[0]);

    while( window.isOpen() ) {
        auto tp = std::chrono::steady_clock::now();
        sf::Event event;
        while( window.pollEvent(event) ) {
            switch(event.type) {
            case sf::Event::Closed:
                window.close();
            break;
            default:
            break;
            }
        }
        window.clear();
        //window.draw(bulletVertices);
        window.draw(bufferVertices);
        window.display();
        auto dur = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now()-tp);
        std::cout<<"Frame takes: "<< dur.count() <<"microseconds.\n";
    }
    return 0;
}
 
« Last Edit: August 01, 2019, 03:51:26 am by nogoodname »

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6193
  • Thor Developer
    • View Profile
    • Bromeon
Re: A single draw call severely slows thing down
« Reply #3 on: August 01, 2019, 12:47:35 pm »
So I've went and tried to use the vertex buffer, and I didn't noticed a noticeable increase in draw speed.
Which usage mode did you try with? Did you experiment with different modes?

std::unique_ptr<Bullet> bullets[TOTAL_BULLETS];
This is rather unusual:
  • The stack has a limited size. Allocating TOTAL_BULLETS objects in automatic storage may cause a stack overflow, depending on how big TOTAL_BULLETS is.
  • Having each element as a std::unique_ptr means that all bullets are stored in a separate memory location. Even if this is maybe not the problem of this thread, iteration will be unnecessarily slow because a) non-contiguous memory iteration prevents pre-loading and leads to cache misses and b) you have an extra indirection on every element access. Furthermore, allocation is slower and you have a per-element overhead on dynamic memory bookkeeping.
Why not simply std::vector<Bullet>?
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development: first SFML book

nogoodname

  • Newbie
  • *
  • Posts: 13
    • View Profile
Re: A single draw call severely slows thing down
« Reply #4 on: August 02, 2019, 02:24:32 am »
(click to show/hide)

Oh, a std::vector of Bullets should be better than an array of Bullet pointers. Updated version of the code (all the points of interests are after where I marked //HERE.)

#include <SFML/Graphics.hpp>
#include <SFML/Window.hpp>
#include <chrono>
#include <iostream>
#include <vector>
static const sf::IntRect rect = {0,0,64,64};
static const sf::Vector2f initPos = {400,150};
class Bullet {
public:
    Bullet() = default;
    Bullet(sf::Vector2f vel_) : velocity{vel_} {
        vertices[0].texCoords = sf::Vector2f{0,0};
        vertices[1].texCoords = sf::Vector2f{64,0};
        vertices[2].texCoords = sf::Vector2f{64,64};
        vertices[3].texCoords = sf::Vector2f{0,64};
        for(size_t i=0; i<4; ++i)
            vertices[i].position = initPos + vertices[i].texCoords;
        vertices[4] = vertices[0];
        vertices[5] = vertices[2];
    }
    void update() {
        for(size_t i=0; i<6; ++i) {
            vertices[i].position += velocity;
        }
    }
    sf::Vertex* getVertices() {
        return vertices;
    }
private:
    sf::Vertex vertices[6];
    sf::Vector2f velocity;
};
int main() {
    sf::RenderWindow window{sf::VideoMode{800,600}, "Test"};
    constexpr size_t TOTAL_BULLETS = 20000;
    constexpr float pi = 4*std::atan(1);
    std::vector<Bullet> bullets{TOTAL_BULLETS};
    sf::VertexArray bulletVertices{sf::Triangles,TOTAL_BULLETS*6};
    for(size_t j=0; j<TOTAL_BULLETS; ++j) {
        float angle = 2*pi * static_cast<float>(j)/TOTAL_BULLETS;
        auto vel = sf::Vector2f{std::cos(angle),std::sin(angle)} * 1.0f;
        bullets[j] = Bullet{vel};
        auto vertice = bullets[j].getVertices();
        for(size_t i=0; i<6; ++i)
            bulletVertices[j*6+i] = vertice[i];
    }
    //HERE
    sf::VertexBuffer bufferVertices{sf::Triangles,sf::VertexBuffer::Static};
    bufferVertices.create(bulletVertices.getVertexCount() );
    bufferVertices.update(&bulletVertices[0]);
    while( window.isOpen() ) {
        auto tp = std::chrono::steady_clock::now();
        sf::Event event;
        while( window.pollEvent(event) ) {
            switch(event.type) {
            case sf::Event::Closed:
                window.close();
            break;
            default:
            break;
            }
        }
        window.clear();
        //window.draw(bulletVertices);
        window.draw(bufferVertices);
        window.display();
        auto dur = std::chrono::duration_cast<std::chrono::microseconds>(std::chrono::steady_clock::now()-tp);
        std::cout<<"Frame takes: "<< dur.count() <<"microseconds.\n";
    }
    return 0;
}
 

I've tried out the 3 Vertex Buffer Usage (Stream, Dynamic and Static), and I couldn't tell a noticeable increase in the draw speed in any usage. It still takes 47,000 microseconds, and from process explorer it seems that my program stills takes about >80% GPU usage but only 1% CPU usage.

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 9300
    • View Profile
    • development blog
    • Email
Re: A single draw call severely slows thing down
« Reply #5 on: August 02, 2019, 09:59:34 am »
What hardware are you running this on?
Official FAQ: https://www.sfml-dev.org/faq.php
Nightly Builds: https://www.nightlybuilds.ch/
——————————————————————
Dev Blog: https://dev.my-gate.net/
Thor: http://www.bromeon.ch/libraries/thor/

nogoodname

  • Newbie
  • *
  • Posts: 13
    • View Profile
Re: A single draw call severely slows thing down
« Reply #6 on: August 03, 2019, 02:41:32 am »
What hardware are you running this on?

My CPU is Intel Pentium G640, and my GPU is Intel HD Graphics ~ Sandy Bridges.

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 9300
    • View Profile
    • development blog
    • Email
Re: A single draw call severely slows thing down
« Reply #7 on: August 03, 2019, 03:11:37 am »
That's not exactly a strong CPU or "GPU", thus I think the frametime values might be expected.
Official FAQ: https://www.sfml-dev.org/faq.php
Nightly Builds: https://www.nightlybuilds.ch/
——————————————————————
Dev Blog: https://dev.my-gate.net/
Thor: http://www.bromeon.ch/libraries/thor/

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1384
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
Re: A single draw call severely slows thing down
« Reply #8 on: August 04, 2019, 12:00:14 am »
On my RX Vega 64 the frame time is < 2 microseconds which is > 500 FPS or 20 million triangles per second.

This seems pretty normal to me. I don't think that many AAA games even draw that much and if they really needed to they have to resort to some pretty advanced tricks like instancing etc.

As eXpl0it3r said, your IGP at 129.6 GFLOPS just isn't that powerful.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

nogoodname

  • Newbie
  • *
  • Posts: 13
    • View Profile
Re: A single draw call severely slows thing down
« Reply #9 on: August 05, 2019, 01:59:40 am »
Oh well, I just wanted to see how many bullets I could draw at once.