Author Topic: Simple SpriteBatch desired interface/features (Read 8756 times)

SuperV1234 · « **on:** June 03, 2015, 03:48:22 pm »

I'm working on a simple SpriteBatch class with the goal of automatically reducing the number of draw calls.

It currently supports layering and texture binding.
The interface looks like this, so far:

// ...load some textures and store them somewhere,
// then create some const references to them.
const auto& txApple(someAssetManager.get(...));
const auto& txBanana(someAssetManager.get(...));
const auto& txOrange(someAssetManager.get(...));
const auto& txSky(someAssetManager.get(...));



// Create a batch sprite manager.
// This would ideally go in your `Game` class or
// in your rendering system instance.
Batch::Manager bm;



// Bind the textures to the batch manager.
// Textures must be bound in advance for maximum
// performance. (No associative lookups!)
// Binding returns an handle-like object.
auto bthApple(bm.bind(txApple));
auto bthBanana(bm.bind(txBanana));
auto bthOrange(bm.bind(txOrange));
auto bthSky(bm.bind(txOrange));



// Create some layers to manage Z-ordering.
// Layer creation returns handle-like objects.
auto btlBackground(bm.createLayer(0));
auto btlForeground(bm.createLayer(1));



// To draw stuff, use `Batch::Sprite` instances.
// They have a similar interface to `sf::Sprite`,
// but require a layer handle and a texture handle.
std::vector<Batch::Sprite> sprites;

for(auto i(0u); i < 10000; ++i)
{
    // Make 10000 apple sprites.
    sprites.emplace_back(btlForeground, txApple);

    // Make 10000 banana sprites.
    sprites.emplace_back(btlForeground, txBanana);

    // Make 10000 orange sprites.
    sprites.emplace_back(btlForeground, txOrange);
}

// Make 1 sky sprite, in the background layer.
sprites.emplace_back(btlBackground, txSky);



// In the game loop, you need to clear, fill and
// render the batch manager.
while(true)
{
    // ...stuff...

    bm.clear();
    for(const auto& s : sprites) bm.enqueue(s);
    bm.drawOn(someRenderTarget);

    // ...stuff...
}

// The `bm.drawOn(someRenderTarget);` call
// will result in 4 draw calls, implemented with
// 4 different vertices arrays.

Could the API/interface be improved?
Is layering sufficient for Z-ordering needs?
Is the ownership model of textures/layers fine?
What other features would you expect from a sprite batch?

binary1248 · « **Reply #1 on:** June 05, 2015, 02:01:24 am »

Quote from: SuperV1234 on June 03, 2015, 03:48:22 pm

Could the API/interface be improved?

I don't really understand what the purpose of those texture and layer handles is.

According to the comments "Textures must be bound in advance for maximum performance. (No associative lookups!)". Why is this necessary? You end up passing the texture reference to .draw() as a sf::RenderState anyway, so you might as well let the user simply construct the texture themselves and just pass it on to SFML when the time comes. This also doesn't require any associative lookups.

Also, why do we need handles for layers? All layers are meant to do is specify an order in which the buckets of draw calls are actually dispatched. Simply storing a numerical value would suffice if you ask me.

Quote from: SuperV1234 on June 03, 2015, 03:48:22 pm

Is layering sufficient for Z-ordering needs?

See above. If you aren't constrained by a "layer concept" and could specify an arbitrary ordering using numerical values, you will have just as much power as raw Z-ordering without having to specify up front which logical layer a sprite should belong to.

Quote from: SuperV1234 on June 03, 2015, 03:48:22 pm

Is the ownership model of textures/layers fine?

As stated above, I don't think they are even necessary. Just let the user use their own sf::Textures as usual and specify numerical values instead of layer handles.

Quote from: SuperV1234 on June 03, 2015, 03:48:22 pm

What other features would you expect from a sprite batch?

Better batching.

In terms of OpenGL draw calls, yes, they do get reduced, but only in optimal scenarios. Since you don't re-order the sprites in order to minimize state changes, a user specifying their sprites in a really disadvantageous way will not benefit at all from your batching. In fact, it will even add additional overhead in that case.

Where your batcher does save time is within the sf::RenderTarget methods. When batching does work, less time is spent in there and potentially on the GPU since you pre-transform vertices (this leads to more optimistic paths being taken when the GPU realizes it doesn't have to do anything). However, in exchange for reducing the time spent in those locations, we need to consider the extra time that will be spent in your batcher. From your example, it seems like it will scale linearly with the number of sprites that you actually intend to draw with it since you seem to have to reconstruct the draw queue again every frame. I just don't see the advantage your class is supposed to provide over "manual batching" via sf::VertexArray.

Have you run any performance tests using your batcher? Where does it save time? On the GPU, in the driver or in the application?

The current state of the batcher is a start, but there is still much more to do in order to be useful in real world scenarios if you ask me. When writing my own batchers (similar to how I designed the SFGUI renderers) I like to measure the amount of draw calls that are actually issued to OpenGL in total every frame. I estimate that for a typical 2D SFML application that doesn't make too much use of shaders, it can easily be dropped below 10 in total per frame. This of course requires more advanced techniques such as texture atlasing, but that is what a batcher is there for...

SuperV1234 · « **Reply #2 on:** June 05, 2015, 04:29:44 pm »

Quote from: binary1248 on June 05, 2015, 02:01:24 am

I don't really understand what the purpose of those texture and layer handles is.

According to the comments "Textures must be bound in advance for maximum performance. (No associative lookups!)". Why is this necessary? You end up passing the texture reference to .draw() as a sf::RenderState anyway, so you might as well let the user simply construct the texture themselves and just pass it on to SFML when the time comes. This also doesn't require any associative lookups.

Also, why do we need handles for layers? All layers are meant to do is specify an order in which the buckets of draw calls are actually dispatched. Simply storing a numerical value would suffice if you ask me.

The design of this sprite batch allows it to be quickly integrated in projects using `sf::Sprite` without any batching.

Whenever you bind a texture to the batch manager, you get an unique integer (starting from 0) identifying that texture.

Whenever you create a layer in the batch manager, the layer automatically creates contiguous data structures (`std::vector` instances, for now) for every bound texture.

When you want to draw a Batch::Sprite on a specific layer, having the sprite know the ID of the texture it has and the ID of the layer it needs to be drawn onto results in some contiguous memory direct access lookups and 4 `sf::Vertex` emplacements.

The user can simply store a Batch::Sprite instance in its game entity class (or replace `sf::Sprite` instances), and can fire-and-forget multiple Batch::Sprite draw calls - the batch manager, thanks to the unique IDs of textures and layers, will deal with minimizing draw calls.

The code for the sprite batch is here (still very primitive), but I hope it clarifies the idea:
https://github.com/SuperV1234/Experiments/blob/master/Random/batching.cpp

When you call `Batch::Sprite::draw()` you're just asking the manager to "enqueue" the sprite in the right layer, in the right vertices container for its texture.

Quote from: binary1248 on June 05, 2015, 02:01:24 am

As stated above, I don't think they are even necessary. Just let the user use their own sf::Textures as usual and specify numerical values instead of layer handles.

The user manages the lifetime of their own `sf::Textures`. Binding them to the Batch::Manager is pure convenience - having an handle object that refers to that texture that can be used in Batch::Sprites allows the user to not having to specify the wanted texture during the draw call.

I'm trying to make the system as easy as possible to substitute to existing `sf::Sprite`-based code.

The idea behind layers is that the user does not care about the drawing order of sprites in the same layer - but that may not be realistic.
One thing I'm considering to add, is another layer type, where instead of having separate buffers for every texture, there is an unique buffer where vertices are sorted inside the same buffer by an user-specified Z-order.

If the user requires more fine-tuning with it's Z-order, that type of layer can be used, but it will definitely result in more draw calls.

Quote from: binary1248 on June 05, 2015, 02:01:24 am

Better batching.

In terms of OpenGL draw calls, yes, they do get reduced, but only in optimal scenarios. Since you don't re-order the sprites in order to minimize state changes, a user specifying their sprites in a really disadvantageous way will not benefit at all from your batching. In fact, it will even add additional overhead in that case.

Where your batcher does save time is within the sf::RenderTarget methods. When batching does work, less time is spent in there and potentially on the GPU since you pre-transform vertices (this leads to more optimistic paths being taken when the GPU realizes it doesn't have to do anything). However, in exchange for reducing the time spent in those locations, we need to consider the extra time that will be spent in your batcher. From your example, it seems like it will scale linearly with the number of sprites that you actually intend to draw with it since you seem to have to reconstruct the draw queue again every frame. I just don't see the advantage your class is supposed to provide over "manual batching" via sf::VertexArray.

Have you run any performance tests using your batcher? Where does it save time? On the GPU, in the driver or in the application?

The current state of the batcher is a start, but there is still much more to do in order to be useful in real world scenarios if you ask me. When writing my own batchers (similar to how I designed the SFGUI renderers) I like to measure the amount of draw calls that are actually issued to OpenGL in total every frame. I estimate that for a typical 2D SFML application that doesn't make too much use of shaders, it can easily be dropped below 10 in total per frame. This of course requires more advanced techniques such as texture atlasing, but that is what a batcher is there for...

Having only minimally used OpenGL without SFML, I do not really have a lot of experience/knowledge on the subject. Maybe I'm approaching this in the wrong way...

You're correct when you say that "it will scale linearly with the number of sprites". I am re-creating the draw queue every frame.

But I was under the impression that calling `sf::Sprite::draw()` does actually execute an OpenGL draw call.

Drawing 10000 `sf::Sprite` instances with the same texture would result in 10000 OpenGL draw calls.
Drawing 10000 `Batch::Sprite` instances with the same texture would result in a single OpenGL draw call.

The advantage of my system, over `sf::VertexArray`, is pure convenience - binding textures and layers to the manager, and having `Batch::Sprite` instances store their texture ID and layer ID, allows the user to "think in terms of `sf::Sprite`" and still get some performance benefits from automatic batching.

binary1248 · « **Reply #3 on:** June 05, 2015, 08:52:14 pm »

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

Whenever you bind a texture to the batch manager, you get an unique integer (starting from 0) identifying that texture.

Whenever you create a layer in the batch manager, the layer automatically creates contiguous data structures (`std::vector` instances, for now) for every bound texture.

When you want to draw a Batch::Sprite on a specific layer, having the sprite know the ID of the texture it has and the ID of the layer it needs to be drawn onto results in some contiguous memory direct access lookups and 4 `sf::Vertex` emplacements.

Are you sure you are actually reducing the number of indirect memory accesses by doing this?

With your system you are essentially going to do this when issuing the final SFML draw call:
Batch -> std::vector -> sf::Texture -> OpenGL Texture ID

By simply storing pointers to the textures just like how sf::Sprite already does it, it would look like this:
Batch -> sf::Texture -> OpenGL Texture ID

Sure, a std::vector lookup is cheap, but it still costs something, and if it can be left out, I don't see why it shouldn't.

Also, if you consider a "typical" scenario where the user queues multiple sprites to your batcher, in any thought out entity system, the user will often already specify the sprites almost in the right order for drawing. Like you said, since draw order matters when drawing sf::Sprites yourself, there will be nothing to divide into layers/buckets. I really think just specifying a numerical value as a layer identifier and using a well suited (performs well for lists which are already almost sorted) sorting algorithm on the final queue would still be more efficient than how it is currently implemented.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

The user can simply store a Batch::Sprite instance in its game entity class (or replace `sf::Sprite` instances), and can fire-and-forget multiple Batch::Sprite draw calls - the batch manager, thanks to the unique IDs of textures and layers, will deal with minimizing draw calls.

Draw calls are only 1 side of the story. It is what many laypeople/gamers/etc. think is the main bottleneck of graphics APIs/GPUs because of certain misinformation (*cough* excuses *cough*) that game developers happen to come up with to explain why their software performs so poorly. If you want some good information about how to cut down on the OpenGL overhead and redundant state changes I recommend watching . It is aimed primarily at OpenGL developers, but I think the parts that might interest you start from around 31:55. In order to write a good batcher, you mustn't only think about reducing draw calls or saving a few CPU memory accesses here and there, you need to look at the whole picture (and OpenGL is a really big part of that picture). I estimate that a well implemented batcher can make at least an order of magnitude difference, especially when you throw some initially really poorly optimized drawing implementations at it.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

The code for the sprite batch is here (still very primitive), but I hope it clarifies the idea:
https://github.com/SuperV1234/Experiments/blob/master/Random/batching.cpp

I've already looked at the code, quite hard to read if you are not used to it.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

When you call `Batch::Sprite::draw()` you're just asking the manager to "enqueue" the sprite in the right layer, in the right vertices container for its texture.

You are basically making the user pre-sort the sprites in the right order already by giving them multiple buckets. The same could be done by just using multiple batchers (1 per layer) and drawing them in the right order when you are done constructing the queues. This kind of wastes potential optimization possibilities between layers.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

The user manages the lifetime of their own `sf::Textures`. Binding them to the Batch::Manager is pure convenience - having an handle object that refers to that texture that can be used in Batch::Sprites allows the user to not having to specify the wanted texture during the draw call.

As stated above, the same could be done by saving a pointer to the sf::Texture along with each sprite in the queue instead.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

The idea behind layers is that the user does not care about the drawing order of sprites in the same layer - but that may not be realistic.

This is definitely not realistic.

What SFML users perform when ordering their sprites themselves is called the painter's algorithm, you draw back to front. Since SFML doesn't support depth, this is the only option they have. It is a mistake to assume that providing a batcher allows them to all of a sudden forget about ordering all together. They will always want to order sprites, even within the same layer.

You might not know this, but the very fact that SFML doesn't support depth can have a significant impact on raw GPU (not driver) performance. Overdraw is the phenomenon that any experienced graphics programmer will always try to hunt down and exterminate. This means that ironically, drawing front to back actually yields higher performance if you have depth enabled, especially in scenes where you have many non-transparent entities overlapping each other. If you have transparent entities, you are better off sticking to back to front draw order unless you are very very experienced and know how to do it front to back as well.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

One thing I'm considering to add, is another layer type, where instead of having separate buffers for every texture, there is an unique buffer where vertices are sorted inside the same buffer by an user-specified Z-order.

If the user requires more fine-tuning with it's Z-order, that type of layer can be used, but it will definitely result in more draw calls.

You should really just combine this into a single queue that is sorted before drawing like I described above. Introducing too many separate concepts that are only there to solve specific edge cases will clutter up what could be a simple and intuitive API.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

Having only minimally used OpenGL without SFML, I do not really have a lot of experience/knowledge on the subject. Maybe I'm approaching this in the wrong way...

I hope it has become obvious from what I just said that having at least a basic understanding of OpenGL is essential in order to target the real bottlenecks.

It isn't that hard if you are willing to commit a bit of time to it. Some people might not agree with me, but I think that the modern API is easier to learn and completely understand than the legacy API. There are way less functions and states to know about, and if you start out with familiarizing yourself with the pipeline and the concepts surrounding it, you will quickly realize that it isn't as complicated as some might think at first glance.

Quote from: SuperV1234 on June 05, 2015, 04:29:44 pm

But I was under the impression that calling `sf::Sprite::draw()` does actually execute an OpenGL draw call.

Drawing 10000 `sf::Sprite` instances with the same texture would result in 10000 OpenGL draw calls.
Drawing 10000 `Batch::Sprite` instances with the same texture would result in a single OpenGL draw call.

This only works out in very very very optimal scenarios. In reality you would probably still have a few thousand draw calls for those 10000 sprites since there will often be incompatible state changes that will break the batches. This could be solved by re-ordering the sprites in order to minimize the state changes, but like I said above, this might not be what the user wants/expects. You will have to perform a lot of behind-the-scenes "magic" to reduce state changes and still produce the same final image as if (yes... just like as-if in C++

) the sprites were individually drawn using the standard method. I don't know what you still have planned, but I don't really see any of this "magic" yet.

Author Topic: Simple SpriteBatch desired interface/features (Read 8756 times)

SuperV1234

Simple SpriteBatch desired interface/features

binary1248

Re: Simple SpriteBatch desired interface/features

SuperV1234

Re: Simple SpriteBatch desired interface/features

binary1248

Re: Simple SpriteBatch desired interface/features