Author Topic: Better Caching Possible? (Read 3920 times)

FRex · « **on:** September 30, 2017, 09:28:23 pm »

Continued from: https://github.com/SFML/SFML/pull/1297

As I said there, I think there is a way to cache these pointers a bit better..?

For something like a loop that draws same circle over and over (with different position, color, transform, but same vertices pointer) the code right now rebinds everything on each draw, but if you did the same with a sprite then because sprite uses the cache it'll do bind just the first time you draw.

I think we should keep track of pointer that was last bound and only bind it it changes, it'd also make the code simpler and more edge case proof, like against that edge case that started the above PR.

binary1248 · « **Reply #1 on:** September 30, 2017, 10:29:12 pm »

If you want the cache to be used for polygons with more than 4 vertices, all you have to do is increase the constant defined here. Just so that people understand, when caching, it is not the vertex storage of the Drawable that is kept bound, but the address to the cache storage. The cache doesn't save on any transformations that have to be applied to the vertices, merely the re-binding of the pointers to the cache storage.

FRex · « **Reply #2 on:** September 30, 2017, 10:45:11 pm »

I know, I'm familiar with the code, I just took a look at it.

I don't mean to make the cache itself (as in - do transform on the CPU and store result in that few internal vertices and draw with GL transform being an identity one) be used with longer vertex arrays, I mean that uncached draws with same pointer shouldn't bind the same pointer again as they do now. It'd also make code simpler than it is now or in your fix, the only downside is ABI breakage due to extra fields.

See example: https://github.com/FRex/SFML/tree/betterptrcache

binary1248 · « **Reply #3 on:** September 30, 2017, 11:51:44 pm »

A while back, there was a discussion on whether the vertex cache actually made a measurable difference. My measurements didn't show any difference which is why I even suggested removing it at that point. Because it did make a slight difference for certain users in certain cases, it was left in. I have a weird feeling that your proposal takes it one step closer to the extreme (storing a complete copy of OpenGL state) and I am not sure if an improvement in performance (if there even is one) will warrant the increased complexity in the RenderTarget implementation. One mustn't forget that 1. what you are doing probably isn't too common of a use case and 2. there are more efficient ways of attaining the same effects without having to modify SFML.

FRex · « **Reply #4 on:** October 01, 2017, 12:18:52 am »

I'd say drawing a bunch of circles in a loop and just changing their size or position is not that uncommon. Yes, the outline would make these saving go away, but still.

My code makes this simpler and the hoops about not binding ptr to cached vertices again (that's the saving on the cache, I thought the saving was due to transforming on the CPU, so these calls are either cheap and we don't need to ever worry and should remove this strange code, or they are measurable for an edge case and we should try our best) already caused a bug in an edge case so simpler code is better.

The code in there is already sort of remembering some GL state related to these ptrs in a roundabout non direct way by these nested ifs for the vertex cache case, my code is simpler because I store that explicitly. And cache already remembers stuff such as texture, shader, etc. that were set, to not set them again because it's costly.

I'll tried to see if these changes are measurable even:

50k circles with my code from here: ~23 FPS.
50k circles with current master (with the bug still): ~21FPS.
That's on the integrated Intel, on GTX 950M it's 28 vs 25.
So it's barely anything but measurable I guess and I'd still do it for simplicity's sake and even if these calls are heavy on some bad/old drivers or cards they will never be lighter than my few extra ifs.

Curiously enough that 4 vertex cache when running untextured also has quite an impact (like 27 vs. 50 FPS) but for textured draws not really at all... and they were quite slower on their own too, down to few FPS.

This is a modern 'gaming' laptop with fully up to date Windows 10 though so for low spec systems it might come out differently.

binary1248 · « **Reply #5 on:** October 01, 2017, 01:58:07 am »

I don't think adding additional state to track makes things simpler. The only reason why nested ifs are necessary now is because there is no other way short of writing out a chain of else ifs to realize a table of possible scenarios and what to do in each. If you write the possibilities down in a table you will notice there aren't really that many, they just all need to be accounted for, and the current fix takes care of a case in the table that wasn't. It was simply an oversight by the implementer of the optimization, but I don't think that the fix makes the code any more complex than it was without it.

Also, your test scenario is unrealistic. Even the latest AAA games tend to keep their total draw calls in the low 1000s. With 50k draw calls you are making the API call overhead look worse than it would actually be in a real world scenario. Sure the driver might internally batch them together somehow, but every optimization has its limits.

When people say "minimize state changes" in regards to OpenGL, what some might tend to forget is that they are referring to pipeline state changes e.g. texture binding, buffer bindings, program bindings etc. All these gl*Pointer calls are modifying client i.e. CPU state. The GPU never sees any of this stuff. For all we know, calling glVertexPointer might just be a simple write to some memory address, which would mean the same overhead as any other call into an external library e.g. operator new. We mustn't forget that SFML is still running around legacy-land, so the whole premise that we are trying to optimize something that is by nature sub-optimal is already a bit absurd.

We need real world performance metrics to see if an optimization is worth it. It is easy to construct an example that is designed to put concentrated stress on to some aspect of a system, be it the driver itself or in our case SFML as a library. The possibility that people can use/do something wrong shouldn't make an optimization necessary. That's essentially what driver developers have been stuck doing the last 20 years, making up for crappy/hastily tossed together game code, which is why Vulkan/DX12 is going to make a big difference. For the first time in a long time, the fault will be solely on the side of the developers if their code runs horribly. They will have to optimize their own code first before putting the blame on the GPU vendors. The same should apply in our case. Only once the user has exhausted their optimization possibilities should the library be forced to go out of its way to lend a helping hand.

FRex · « **Reply #6 on:** October 01, 2017, 02:16:06 am »

Yes, yes. I know this is all legacy* and that glDrawArrays transfers data on each call so yes, the pointer is totally CPU state, but then why care about it in the case of the cache? And I had to make the 'overhead' show up on the clock somehow so I just did 50k draw calls.

I mean, either these calls are important and then my code is an improvement or it's not important and we should throw any checks out, to simplify the code, since these if chains, no resetting pointer to internal cache and trying to skip the texture pointer call already caused this bug with stale texture pointer and code is hard-ish to follow because of this extra logic.

* I don't mind that at all too. If my laptop didn't break I'd still be stuck with GL 2.1 capable GPU. I like that SFML can target low spec machines outside of first world countries.

binary1248 · « **Reply #7 on:** October 01, 2017, 02:21:15 am »

Like I said... I was never really a fan of the cache, let alone this additional optimization which even slipped in a bug. But as in all things, I am not the only one who decides on what gets in or not. As already mentioned, I could never measure any noticeable difference with or without any caching strategy, so having less code is always the better option for me. I'm just here to fix bugs without investing too much effort into the wrong direction.

FRex · « **Reply #8 on:** October 01, 2017, 02:30:06 am »

I'm not commenting on the cache itself here, just the weird point that only when the draw is 'cached' it tries to save on calling these bind ptr calls and in other case it just rebinds them to the same thing, just without texture coords ptr if it's not needed. Either this stuff is important and we should always care or it isn't and we should never care.

Author Topic: Better Caching Possible? (Read 3920 times)

FRex

Better Caching Possible?

binary1248

Re: Better Caching Possible?

FRex

Re: Better Caching Possible?

binary1248

Re: Better Caching Possible?

FRex

Re: Better Caching Possible?

binary1248

Re: Better Caching Possible?

FRex

Re: Better Caching Possible?

binary1248

Re: Better Caching Possible?

FRex

Re: Better Caching Possible?