Author Topic: Would anybody benefit from having something like sf::VertexBuffer? (Read 9620 times)

binary1248 · « **on:** October 01, 2017, 08:18:38 pm »

After running a few of the current project highlights through AMD GPU PerfStudio, I noticed something really interesting with the way Cendric causes OpenGL operations to be submitted to the driver.

Basically, for really huge vertex arrays, a disproportionately large amount of the time spent calling OpenGL is spent in the actual draw calls themselves. This is in stark contrast to what many might believe is the source of all the time spent using OpenGL.

The time spent in those draw calls isn't actually the time the GPU needs to render anything. Commands are still queued and submitted to the GPU to be executed asynchronously. This also isn't the time that "just has to be spent" doing what any draw call would do, otherwise the draw calls that process only 4 vertices should take just as much time.

Keeping the definitions of gl*Pointer and glDrawArray functions in mind, the spent time almost certainly has to come from the copying of the vertex data from the address space of the application into the address space of the driver for asynchronous DMA transfer to the GPU.

In this case, 19600 sf::Vertex worth of data had to be copied on every draw call (this is assuming the driver is smart and combines copying the 3 memory blocks specified by the pointers into a single copy). Since each sf::Vertex is 20 bytes big, this memory block is 392 KB. For reasons the author will know, it is submitted 8 times per frame for a total of 3136 KB and assuming we want to run at a minimum of 60 FPS, 188.16 MB per second.

Yes... 188.16 MB/s is still a long way from the GB/s memory bandwidth that we have within system RAM and between CPU and GPU, but still... If you consider that in this example, 50% of the CPU time the application needs per frame is spent merely copying data around it makes you wonder if there are any better alternatives.

(Again, this was all assuming the driver isn't stupid and making 3 copies per draw which would make it 564.48 MB/s)

I wouldn't call this a bandwidth bottleneck as is. The FPS in this game is still probably bottlenecked more by some GPU-specific factors e.g. fillrate etc. This is about letting the CPU do more useful things at the same time the GPU is eating through those commands.

Which leads me to the question: Would it make sense to introduce something like sf::VertexBuffer?

sf::VertexBuffer would be something in between sf::VertexArray and sf::Texture. Just like sf::Texture, sf::VertexBuffer would live in the GPU while it is alive, and like sf::VertexArray, it would contain an array of sf::Vertex data. Unlike sf::VertexArray, reading from an sf::VertexBuffer would be just as expensive as reading from an sf::Texture since it would incur a GPU-CPU readback. However, considering it is a very common use case to only submit data without having to read it back this wouldn't be that big of a problem.

The main advantage of sf::VertexBuffer over sf::VertexArray would be that because it lives on the GPU, it won't have to be copied every draw call. Again, this isn't just about saving memory bandwidth. In all games I measured it was never a bottleneck, although one must consider I have a pretty high end system so it might become a bottleneck on crappier systems. What is guaranteed is that the drawing thread (in Cendric's case there is only a single thread) is free to do other useful things more often during a single frame instead of spending a lot of time waiting for memory to be copied. This can be things like AI, physics, sound etc. Even if the final FPS of the current games would not increase, it would leave the authors with more room to do other interesting CPU-side things that might not have been possible because the game became CPU-bound. In the case that one day games do become memory bandwidth-bound, keeping as much data in GPU memory as possible will also help to reduce the bottleneck.

So, what I want to ask is: Is there anybody out there who would actually, at the current time, benefit from keeping vertex data on the GPU using an sf::VertexBuffer? I have a feeling that Cendric would, but until there is an implementation to test with, it is just a theory.

Laurent · « **Reply #1 on:** October 01, 2017, 10:31:00 pm »

I have a question: in the worst case (ie. we have to replace the entire vertex data at every frame), would sf::VertexArray perform better/the same/worse than sf::VertexArray?

If not worse, I think it could be ok to modify sf::VertexArray itself (to store data on the GPU) rather than adding a similar class -- but that would of course be for SFML 3, because the API can't stay the same. Both the design and implementation of the graphics module would have to change in SFML 3 anyway (in my opinion), but maybe you'd like some change there before that distant milestone

binary1248 · « **Reply #2 on:** October 01, 2017, 10:36:04 pm »

Internally, the driver will have to get the data from our address space into the GPU somehow. Considering that buffer objects were designed to be the "one true way" of managing vertex data going forward, I wouldn't be surprised if the old client side arrays are emulated using streaming buffer objects in newer driver implementations. If this is really the case, using streaming VBOs cannot be worse and are very likely to perform better than client side arrays in every use case.

Laurent · « **Reply #3 on:** October 01, 2017, 11:19:02 pm »

Well, then if you think we could design and implement a better replacement for sf::VertexArray, we can try to introduce it in SFML 2.x and deprecate the former.

binary1248 · « **Reply #4 on:** October 01, 2017, 11:23:23 pm »

Like I said, the one thing you won't be able to do with a sf::VertexBuffer is use it as random access storage for your sf::Vertex data. operator[] wouldn't make much sense since it is used both for reading and writing. I don't know if there are users who are really using sf::VertexArray to store their vertex data for future lookup but if there are then it is better to add a new distinct class for the other usage pattern. It will come with its caveats, but when used as intended it can lead to performance gains.

Laurent · « **Reply #5 on:** October 02, 2017, 08:44:10 am »

Quote

Like I said, the one thing you won't be able to do with a sf::VertexBuffer is use it as random access storage for your sf::Vertex data

I know, and I think we should stop providing the ability to do so. Just like for textures and pixels, if people need to access vertex data, they should store it themselves when they build/update their vertex buffer. sf::Texture still has a way to download the pixel data from GPU memory, but that's because it can be created there (render-texture, update from window content, ...). Vertex data, on the other hand, will always be defined by user on CPU side first, so let's just assume that he can keep that data around if he needs it later.

dabbertorres · « **Reply #6 on:** October 02, 2017, 09:44:26 am »

I know I'd appreciate having this in SFML!

Hapax · « **Reply #7 on:** October 03, 2017, 01:12:35 pm »

Another step towards modern GL within v2 (therefore without breakage) can only be a good thing.

Providing an sf::VertexBuffer in addition to sf::VertexArray (where the buffer is available publicly to use instead of the array), with the vertex buffer working the way described by Binary and the vertex array working as it did previously. This gives people the option to use either case: array directly or a buffer updated manually when required. Of course, in v3, the vertex array might not be sticking around (it's mostly redundant anyway) so both wouldn't need to be required.

I suppose the point I was making there is that this feature should be added for manual control rather than modifying existing version to automatically update. Basically, update when needed, not necessarily when drawn.

Hiura · « **Reply #8 on:** October 04, 2017, 10:35:44 am »

I think this is an interesting feature and having it in v2 already would enable us to get some feedback. I vote in favour!

Author Topic: Would anybody benefit from having something like sf::VertexBuffer? (Read 9620 times)

binary1248

Would anybody benefit from having something like sf::VertexBuffer?

Laurent

Re: Would anybody benefit from having something like sf::VertexBuffer?

binary1248

Re: Would anybody benefit from having something like sf::VertexBuffer?

Laurent

Re: Would anybody benefit from having something like sf::VertexBuffer?

binary1248

Re: Would anybody benefit from having something like sf::VertexBuffer?

Laurent

Re: Would anybody benefit from having something like sf::VertexBuffer?

dabbertorres

Re: Would anybody benefit from having something like sf::VertexBuffer?

Hapax

Re: Would anybody benefit from having something like sf::VertexBuffer?

Hiura

Re: Would anybody benefit from having something like sf::VertexBuffer?