Author Topic: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader (Read 9071 times)

Fewes · « **on:** May 05, 2015, 03:58:20 am »

Hi all!

I'm working on a 2D lighting system for sprites which runs on GLSL shaders. Currently the lighting itself is very fast as it's done almost entirely in the fragment shader however I have run into a bottleneck which is the way I am creating the input mask the shader works off.
Basically what I am doing is drawing every sprite I want to put in the mask one extra time by setting its color to black before drawing it to the sf::RenderTexture. My solution for rendering to different channels is a bit hacky as I couldn't figure out how to do it any other way (I started looking at a custom blend mode but didn't manage to create anything that worked) and as such I have one RenderTexture for every channel(!) which is then combined into a single RenderTexture using BlendAdd. The obvious problem with this being that there are quite a lot of draw calls going into screen sized RenderTextures which quickly stack up depending on how many sprites I put in spriteVector.

Now I realize this might not even be possible but if anyone could shed some light on anything I could do to gain performance that would be greatly appreciated. Thread title implies a certain solution but I'm open for any suggestions as I probably can make do with a lot of different things inside the shader (heck I'd even take a 1-bit single channel mask at this point...).
My gut feeling tells me I should go lower level and I started looking into stencil and depth buffers in OpenGL but I feel like I misunderstand how those tie into a 2D pipeline. Should have spent all that time writing shaders learning C++, I suppose...

Anyway, here's my code for drawing up the mask (the MaskedSprite class is just a wrapper for an sf::Sprite containing some extra values):

        // Clear mask channels
        rt_sceneMask_red.clear(sf::Color::White);
        rt_sceneMask_green.clear(sf::Color::White);
        rt_sceneMask_blue.clear(sf::Color::White);
        rt_sceneBuffer.clear(sf::Color(30, 30, 30));
        
        rt_sceneBuffer.draw(s_background);
        // Sprite masking
        for (std::vector<LightDemo::MaskedSprite*>::const_iterator it = spriteVector.begin(); it < spriteVector.end(); it++) {
                // Get sprite pointer
                sf::Sprite* s_ptr = (*it)->getSprite();
                // Save color so we can restore it at end
                sf::Color colorTemp = s_ptr->getColor();
                // Set color to black for masking
                s_ptr->setColor(sf::Color::Black);
                // Draw to red mask (light rim 1st pass, shadows, SSAO)
                if ((*it)->drawToRed())
                        rt_sceneMask_red.draw(*s_ptr);
                // Draw to green mask (light rim 2nd pass, shadows, SSAO)
                if ((*it)->drawToGreen())
                        rt_sceneMask_green.draw(*s_ptr);
                // Draw to blue mask (light sprite blocking)
                if ((*it)->drawToBlue())
                        rt_sceneMask_blue.draw(*s_ptr);
                // Restore color before drawing to scene buffer
                s_ptr->setColor(colorTemp);
                // Draw to scene buffer
                if ((*it)->drawToScene())
                        rt_sceneBuffer.draw(*s_ptr);
        }
        rt_sceneMask_red.display();
        rt_sceneMask_green.display();
        rt_sceneMask_blue.display();
        rt_sceneBuffer.display();
        
        // Combine masks
        rt_sceneMask_RGB.clear(sf::Color::Black);
        rt_sceneMask_RGB.draw(s_sceneMask_red, sf::BlendAdd);
        rt_sceneMask_RGB.draw(s_sceneMask_green, sf::BlendAdd);
        rt_sceneMask_RGB.draw(s_sceneMask_blue, sf::BlendAdd);
        rt_sceneMask_RGB.display();
 

And here's what the lighting and mask looks like:

eXpl0it3r · « **Reply #1 on:** May 05, 2015, 08:11:09 am »

What kind of performance issue are you talking about? Run a profiler to see where the bottleneck actually is.

What is the reason again for having separate color channels?

Nexus · « **Reply #2 on:** May 05, 2015, 10:18:08 am »

Quote from: Fewes on May 05, 2015, 03:58:20 am

Basically what I am doing is drawing every sprite I want to put in the mask one extra time by setting its color to black before drawing it to the sf::RenderTexture. My solution for rendering to different channels is a bit hacky as I couldn't figure out how to do it any other way (I started looking at a custom blend mode but didn't manage to create anything that worked) and as such I have one RenderTexture for every channel(!) which is then combined into a single RenderTexture using BlendAdd.

Why don't you draw them directly to the same render texture? You can use a fragment shader that draws only one color, and just change a uniform to switch colors.

Quote from: Fewes on May 05, 2015, 03:58:20 am

(I started looking at a custom blend mode but didn't manage to create anything that worked)

We recently made the blending modes fully customizable and thus quite powerful. If they support your way of combining colors/channels, that might indeed be the fastest way.

Fewes · « **Reply #3 on:** May 05, 2015, 01:50:04 pm »

Quote from: eXpl0it3r on May 05, 2015, 08:11:09 am

What kind of performance issue are you talking about? Run a profiler to see where the bottleneck actually is.

Right. I have tested the performance of the code quite a lot and I am pretty convinced the slowest part of it is where I draw up the mask, which is the code in my original post. Adding just 100 sprites to that vector brings my frame rate down severely even though I have a decent system.

Quote from: eXpl0it3r on May 05, 2015, 08:11:09 am

What is the reason again for having separate color channels?

In this case it's because I am using the different channels to mask out different effects inside the shader, but really couldn't it be anything? Having more data available opens up for more possibilities inside the shader itself and since the texture passed has four channels I might as well use them. That said if I could have a single channel with better performance over this I'd take that solution any day.

Quote from: Nexus on May 05, 2015, 10:18:08 am

Why don't you draw them directly to the same render texture? You can use a fragment shader that draws only one color, and just change a uniform to switch colors.

I did try this already but your post made me look into it again and I stumbled upon how render states work. With a custom shader coupled with the sf::BlendAdd blend mode I managed to make the code much cleaner looking:

        sf::RenderStates rs_red(&shader_redChannel);
        rs_red.blendMode = sf::BlendAdd;
        sf::RenderStates rs_green(&shader_greenChannel);
        rs_green.blendMode = sf::BlendAdd;
        sf::RenderStates rs_blue(&shader_blueChannel);
        rs_blue.blendMode = sf::BlendAdd;

        // Clear mask channels
        rt_sceneBuffer.clear(sf::Color(30, 30, 30));
        rt_sceneMask_RGB.clear(sf::Color::Black);

        for (std::vector<LightDemo::MaskedSprite*>::const_iterator it = spriteVector.begin(); it < spriteVector.end(); it++) {
                // Get sprite pointer
                sf::Sprite* s_ptr = (*it)->getSprite();
                        
                // Masking
                if ((*it)->drawToRed())
                        rt_sceneMask_RGB.draw(*s_ptr, rs_red);
                if ((*it)->drawToGreen())
                        rt_sceneMask_RGB.draw(*s_ptr, rs_green);
                if ((*it)->drawToBlue())
                        rt_sceneMask_RGB.draw(*s_ptr, rs_blue);

                // Draw to scene buffer
                if ((*it)->drawToScene())
                        rt_sceneBuffer.draw(*s_ptr);
        }
        rt_sceneMask_RGB.display();
        rt_sceneBuffer.display();

It still runs about the same but at least this seems less hacky, so thanks for setting me on that path again

I guess the question now is if I can find a way to make this run any faster. I've seen there's a way to enable the depth buffer, but is there a way to specify the z depth when drawing things?

eXpl0it3r · « **Reply #4 on:** May 05, 2015, 03:04:26 pm »

Quote from: Fewes on May 05, 2015, 01:50:04 pm

Right. I have tested the performance of the code quite a lot and I am pretty convinced the slowest part of it is where I draw up the mask, which is the code in my original post. Adding just 100 sprites to that vector brings my frame rate down severely even though I have a decent system.

Personal "tests" and confident statements are not really interesting, use a profiler instead.

Also what "frame rates" are we talking about? FPS is not linear so "severely" break downs aren't very surprising.

Quote from: Fewes on May 05, 2015, 01:50:04 pm

In this case it's because I am using the different channels to mask out different effects inside the shader, but really couldn't it be anything? Having more data available opens up for more possibilities inside the shader itself and since the texture passed has four channels I might as well use them. That said if I could have a single channel with better performance over this I'd take that solution any day.

I guess, I simply don't know enough about lighting etc. to understand this.

Quote from: Fewes on May 05, 2015, 01:50:04 pm

I guess the question now is if I can find a way to make this run any faster. I've seen there's a way to enable the depth buffer, but is there a way to specify the z depth when drawing things?

It's really important to first find out exactly what is "slow", i.e. where the bottleneck is. Without that information all the trying to make things faster may not have any effect, since they never were the bottleneck.

Fewes · « **Reply #5 on:** May 05, 2015, 04:07:23 pm »

Quote from: eXpl0it3r on May 05, 2015, 03:04:26 pm

Personal "tests" and confident statements are not really interesting, use a profiler instead.

Also what "frame rates" are we talking about? FPS is not linear so "severely" break downs aren't very surprising.

Buuuut I'll have to learn new things, can't I just do couts every other line of code

Fair enough though, I will try to to do this! I guess FPS wouldn't be a great measure but I'm measuring the time the masking pass takes as well (can be seen in the picture).

Quote from: eXpl0it3r on May 05, 2015, 03:04:26 pm

I guess, I simply don't know enough about lighting etc. to understand this.

Basically with every mask I can do a lot of new things. I'd have a mask for every object if I could as GLSL shaders are way fast! Having a depth buffer would be the ideal situation as I could rework the shader to light based on any depth instead of working off a single mask and just having sprites be "in front" or "behind".

Without having profiled yet I am fairly sure it just comes down to many Draw() calls, as every object I want to put through the shader is drawn twice at best and four times at worst. I guess that is hardly a revelation but I was hoping since I'm writing to just one channel I could maybe pass 1/4th of the data

Nexus · « **Reply #6 on:** May 05, 2015, 04:21:49 pm »

Quote from: Fewes on May 05, 2015, 01:50:04 pm

I did try this already but your post made me look into it again and I stumbled upon how render states work. With a custom shader coupled with the sf::BlendAdd blend mode I managed to make the code much cleaner looking

You don't need three shaders. One shader is enough, just set a uniform variable that contains the color. Setting a single uniform is probably cheaper than rebinding the whole shader, but more importantly, you have no code duplication.

By the way, when talking about clean looking code:
1. I would either use the constructor to set all arguments or none, but not mix constructor and member assignment.

sf::RenderStates rs(&shader);
rs.blendMode = sf::BlendAdd;
// ->
sf::RenderStates rs;
rs.shader    = &shader;
rs.blendMode = sf::BlendAdd;

2. Use range-based for loops for iteration. Then you don't have that ugly double dereferencing

for (std::vector<LightDemo::MaskedSprite*>::const_iterator it = spriteVector.begin(); it < spriteVector.end(); it++) {
    sf::Sprite* s_ptr = (*it)->getSprite();
    ...
}
// ->
for (LightDemo::MaskedSprite* sprite : spriteVector) {
    sf::Sprite* s_ptr = sprite->getSprite();
    ...
}

I'd also avoid identifiers such as "s_ptr" because they contain zero useful information. Identifiers should contain the variable's purpose. The type ("ptr") is not so important and reminescent of outdated Hungarian Notation.

And this here:

        // Masking
        if ((*it)->drawToRed())
            rt_sceneMask_RGB.draw(*s_ptr, rs_red);
        if ((*it)->drawToGreen())
            rt_sceneMask_RGB.draw(*s_ptr, rs_green);
        if ((*it)->drawToBlue())
            rt_sceneMask_RGB.draw(*s_ptr, rs_blue);

can also be handled in the shader. Set a uniform to tell the shader which color channels to mask, then draw once, not three times. This can be handled super-fast by float multiplication, you don't even need if statements in your GLSL code.

And for further optimizations, if they are necessary, we really need concrete numbers (i.e. time measurements). sf::VertexArray would be an option, for example, but it's pointless to complicate the whole code if the bottleneck lies somewhere else.

Fewes · « **Reply #7 on:** May 05, 2015, 04:58:32 pm »

Quote from: Nexus on May 05, 2015, 04:21:49 pm

You don't need three shaders. One shader is enough, just set a uniform variable that contains the color. Setting a single uniform is probably cheaper than rebinding the whole shader, but more importantly, you have no code duplication.

While I agree the code looks much nicer with just one shader (I did this at first) it also seems to be a tad bit slower even with just setting a single vec4 uniform for each object. When drawing 80 sprites to the buffer the 3 shader solution finished in 24ms while the single shader one does so in 25ms. Might be worth it just to avoid code duplication like you said though as it's not a huge loss. This is only the case if an object is written to only a single mask/channel however so you're right about that. I should have mentioned most objects in the vector are

Quote from: Nexus on May 05, 2015, 04:21:49 pm

1. I would either use the constructor to set all arguments or none, but not mix constructor and member assignment.

2. Use range-based for loops for iteration. Then you don't have that ugly double dereferencing

I'd also avoid identifiers such as "s_ptr" because they contain zero useful information. Identifiers should contain the variable's purpose. The type ("ptr") is not so important and reminescent of outdated Hungarian Notation.

All great tips, thanks! I can't believe I didn't know you could iterate through vectors like that. Sure looks a lot better

I've run the built-in profiler in VS2013 however it only seems to take CPU processing into account which is to be expected I suppose. It also shows calls going to SFML as simply 'sfml-graphics-2.dll' and the like. Does this mean I'll have to link the actual code instead of the binaries?

Nexus · « **Reply #8 on:** May 05, 2015, 05:03:13 pm »

Quote from: Fewes on May 05, 2015, 04:58:32 pm

When drawing 80 sprites to the buffer the 3 shader solution finished in 24ms while the single shader one does so in 25ms.

Such small differences are really meaningless, you can't even say one is faster, because a measuring artefact is much more likely than an actual performance difference. And 80 sprites are nothing, you'd need to draw many objects to see something.

Have you read the last paragraph in my last post?

Fewes · « **Reply #9 on:** May 05, 2015, 05:28:02 pm »

Quote from: Nexus on May 05, 2015, 05:03:13 pm

Such small differences are really meaningless, you can't even say one is faster, because a measuring artefact is much more likely than an actual performance difference. And 80 sprites are nothing, you'd need to draw many objects to see something.

Have you read the last paragraph in my last post?

Well considering if I'm aiming for 60+ fps flat I only have just under 17ms to work with it might make a difference! And here I thought 80 sprites were many...

I read it yes, I was considering if I should try and get some profiler data first but if you trust my frame timer here are some timings

:

(Might be worth noting that the FPS is averaged out over a few frames hence why it doesn't match up with the current frame time)

Frame is the total frame time (excluding rendering the text), Mask & Draw is the code I've posted (although with your improvements) where all sprites are drawn to a RenderTexture (the scene buffer if you will), Lighting is the lighting shader doing it's work once per light on the mask provided and Post is two draw calls with screen sized targets (one for an SSAO effect and the other for combining everything).

That was with 12 sprites being rendered to the buffer and 8 lights in the scene. Here's with 92 sprites and the same amount of lights:

Again, I think it might simply be a high amount of Draw() calls hogging performance so maybe there isn't much more that can be done about it, although I'm going to look into automatically creating vertex arrays for multiple instances of the same sprites if I get further with the project.

Although, looking into it a bit more something seems to be off with the way I am drawing to the mask. If I leave out the code drawing to the mask and only draw to my scene buffer I can have a huge amount of sprites with almost no impact at all. Many, many more than twice the amount which I would think would be the impact of drawing the object an extra time.

Fewes · « **Reply #10 on:** May 05, 2015, 05:53:14 pm »

After some further testing I've realized drawing the same sprite more than once per frame is incredibly slow, even if I don't change any of its properties. I can easily work around this by storing an identical sprite in my wrapper and use that for the mask rendering instead. I find this quite interesting though, is there some sort of low level shenanigans going on behind the scenes that causes this slowdown?

Alright nevermind all that, I just solved it! I simply separated the two draw calls into their own iteration loops and bam, instantly got expected performance. I'm guessing calling the Draw function does a bunch of work if you change the target in between, which I was doing twice per object before but now only have to do once per frame.

Thanks a lot for putting me on the right track and helping me with cleaning up my code, hopefully I can make something out of this and share it if it turns out well enough

Nexus · « **Reply #11 on:** May 05, 2015, 07:10:37 pm »

Quote from: Fewes on May 05, 2015, 05:28:02 pm

Again, I think it might simply be a high amount of Draw() calls hogging performance

Modern 3D games draw thousands, if not millions of polygons every frame, I can assure you that 80 draw calls themselves are not the problem

It depends of course if you perform a lot of operations in your shader (shaders are not free), and if you change other OpenGL states whenever you draw. But the call per se is not relevant at such a low number.

But good that you could solve it! I didn't realize you drew to two different targets, I focused too much on the other parts

Author Topic: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader (Read 9071 times)

Fewes

Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

eXpl0it3r

AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Nexus

Re: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Fewes

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

eXpl0it3r

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Fewes

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Nexus

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Fewes

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Nexus

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Fewes

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Fewes

Re: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader

Nexus

Re: AW: Drawing sf::Sprite alpha only to an sf::RenderTexture for use in lighting shader