Author Topic: Support drawing multiple arrays of same OpenGl mode within one call (Read 4992 times)

ssawa · « **on:** April 07, 2015, 07:56:51 pm »

Hey guys! I'm not sure what the general protocol here but this isn't so much a "request", more just bringing a feature possibility up for discussion. So I've been messing around with a little SVG renderer using SFML and, as tends to happen with mindless hobby projects, am waisting a lot of time with premature optimization. Mainly, I was interested in drawing multiple, unconnected, paths within the same OpenGL call. So basically, one TrianglesStrip vertex array to hold multiple OpenGL primitives.

Now there are a couple of ways to go about this in OpenGL and I've been spending sometime trying to plug them into RenderTarget's draw method. I started off rigging up something using glPrimitiveRestartIndex however it's a, relatively, newer call (introduced in 3.1 if I recall correctly) and while I got it working on my Windows PC the call was completely unsupported on my MacBook Pro (and I have no clue how it would work with OpenGL ES, which appears to be slowly becoming part of SFML's targets). So it looked like glPrimitiveRestartIndex wouldn't work with SFML's cross-platform purposes.

I've more recently moved onto implementing and alternative, glMultiDrawArrays, with some degree of success (Purely hacky proof of concept stuff, nothing close to being generally usable yet). glMultiDrawArrays appears to have much wider support and looks like it might play nicer than primitiveRestart in regards to how RenderTarget is structured. However I suppose the trade off is that from what I can tell it might not be as performant as primtiveRestart, basically coming down to how particular GPU's and drivers handle it.

So I guess this has all been a long winded way of putting a feature up for review and to provide some preliminary research I've done in regards to it. Do you think that being able to draw multiple primitives of the same mode within a single call is a valuable enough feature to be worth trying to implement? Or do you think it's something that would be unnecessary and not very rewarding to the common user?

eXpl0it3r · « **Reply #1 on:** April 07, 2015, 10:24:38 pm »

Not exactly sure what you're asking here...

You can easily put multiple primitives of the same type into an sf::VertexArray (or std::vector<sf::Vertex> and draw it in one call. That's one of the main advantage of using vertex array over arrays of sprites.

Are you asking for something else then?

ssawa · « **Reply #2 on:** April 08, 2015, 12:55:37 am »

Quote from: eXpl0it3r on April 07, 2015, 10:24:38 pm

Not exactly sure what you're asking here...

You can easily put multiple primitives of the same type into an sf::VertexArray (or std::vector<sf::Vertex> and draw it in one call. That's one of the main advantage of using vertex array over arrays of sprites.

Are you asking for something else then?

For primitive modes such as points, lines, triangles, and quads you are absolutely correct; I'm speaking in regards of the others: linestrip, triangle strip, and triangle fan, where each new vertex is related to the last. For instance, if I want to draw multiple triangle fans I would specify the first vertex as the center point and subsequent vertices as points radiating from that center origin, however in the current implementation, to the best of my knowledge, there is no way to define a *new* center point completely unrelated to that original triangle fan and draw them both with one call. The same way it is not possible to start a new triangle strip at a location different than the last vertex. The methods I outlined would allow this functionality.

Nexus · « **Reply #3 on:** April 08, 2015, 01:49:00 am »

To make it clear, we're really only talking about performance here, right? Functionality-wise, everything can be achieved with sf::VertexArray already, either through multiple draw calls or by using the unconnected primitives (e.g. sf::Triangles instead of sf::TrianglesStrip). Do you have a specific use case in mind that would highly benefit from this performance gain?

Even if the implementation itself is not too difficult, you have to keep in mind two things. On one hand, maintenance (i.e. fixing all the bugs on all possible drivers and configurations). And even before, a new API would be necessary, that is extending sf::VertexArray or coming up with something new.

binary1248 · « **Reply #4 on:** April 08, 2015, 02:34:30 am »

Triangle Strip: Use degenerate triangles.
Triangle Fan: Just don't use too many of these. There is only so much you can do with them, and it's not like you can't emulate them with a textured quad instead...
Line Strip: Just use lines instead. You are "wasting" merely 1 extra vertex per primitive.

You have to remember, OpenGL is a pipeline. Pipelines don't like to stop and if you force them to they will annoy you with bad performance. You will get much better performance by shaping your geometry into "pipeline friendly" geometry instead of thinking of how to save draw calls through secondary data structures. Saving draw calls only benefits you if you are driver-bottlenecked, which to be honest almost never happens in a typical SFML application. When dealing with large amounts of geometry in SFML, it will probably be the memory bandwidth that kills performance, since all the data has to be transferred again and again every single frame.

I just don't see how your suggestion will change anything for the better. It won't increase performance in comparison to an optimized draw sequence. It won't make anything easier for the developer to do, since they will have a bigger API to deal with. In really pessimistic scenarios, misuse can even lead to decreases in performance.

If you are not convinced that batching everything into a single call is possible, look at this screen capture of a Super Fidelity GUI:

(click to show/hide)

That whole GUI is rendered using...

(click to show/hide)

And if all else fails, and you really need that raw performance, remember that SFML doesn't go out of its way to prevent you from using raw OpenGL. It might be annoying at times, but it is possible, and the annoyances are currently being addressed one by one.

ssawa · « **Reply #5 on:** April 08, 2015, 04:55:47 am »

Yeah just to be clear I was merely speaking from a performance angle, it wouldn't necessarily add any new functionality. I also agree that for a typical 2d game focused user, which may be who SFML is currently focusing on I don't know, I imagine the program wouldn't require too much more than several disconnected quads with textures mapped to them, in which case this work could easily be seen as overkill. Again what originally led me down this path was SVG and 2d vector rendering in general where something like groups of disparate paths of bezier curves and other complex geometry could result in either a very large number of calls to OpenGL or a large number of unnecessary vertices under the current implementation.

I totally understand if you don't feel that the large majority of users would benefit from this kind of minutia, I might just be approaching it from an atypical perspective. I was more just curious if you guys thought the work I've done for my particular case would be helpful to the community at large. I'm completely fine with withdrawing my motion from the table however

ssawa · « **Reply #6 on:** April 08, 2015, 05:16:57 am »

Quote from: binary1248 on April 08, 2015, 02:34:30 am

You have to remember, OpenGL is a pipeline. Pipelines don't like to stop and if you force them to they will annoy you with bad performance. You will get much better performance by shaping your geometry into "pipeline friendly" geometry instead of thinking of how to save draw calls through secondary data structures. Saving draw calls only benefits you if you are driver-bottlenecked, which to be honest almost never happens in a typical SFML application. When dealing with large amounts of geometry in SFML, it will probably be the memory bandwidth that kills performance, since all the data has to be transferred again and again every single frame.

Those are all really good and worthwhile points worth considering. I don't necessarily see why the two functions I referenced would particularly interrupt the OpenGL pipeline but I don't have a tremendous amount of experience with the inner workings of OpenGL so I'm more than willing to concede that point. Memory constraints are also obviously a big concern however for my particular needs in this project, which again may be strictly specific to me, my solution seemed to make stricter use of memory compared to the basic implementation. Mainly I found that with the way things currently are I would either be forced to add a large number of unnecessary vertices to describe more generic primitives rather than use OpenGL's abstractions such as triangle strip etc; or alternatively require additional instances of VertexArray, which as a C++ class with several functions and members such as std vector, does come with its own overhead that could add up depending on how many calls need to be made. In my particular case, this solution allows me to make the most of my memory and use only one instance of a (modified) VertexArray while still using only the lowest number of vertices as needed to describe my geometry (and some additional book keeping variables to make the new OpenGL calls work but those are trivial in comparison).

binary1248 · « **Reply #7 on:** April 08, 2015, 05:52:16 am »

Quote from: ssawa on April 08, 2015, 05:16:57 am

I don't necessarily see why the two functions I referenced would particularly interrupt the OpenGL pipeline but I don't have a tremendous amount of experience with the inner workings of OpenGL so I'm more than willing to concede that point.

Modern GPUs are becoming eerily similar to the "rest of the machine" that they reside in. They have their own memory, fixed functionality processors and programmable shaders. Just like any CPU, the programmable shaders execute instructions based on a program that is loaded into their instruction memory. And just like any CPU, they perform load/store operations meaning reads/writes to/from memory (i.e. non-register storage). This takes time, a lot of time, that's why just like CPUs shader units have their own caches as well. They try to retain "hot data" so that it doesn't have to be fetched/written too often thus minimizing stalls. I don't know how intelligent the cache and prediction subsystems have become in GPUs, but many people have experienced that a lot of non-linear memory accesses can have a measurable hit on performance as opposed to linear memory accesses. This means that "jumping back and forth" within a potentially large data set such as yours can lead to a high number of cache misses and potential performance drop.

This is all assuming that those draw commands are actually sent to the GPU verbatim, such as when using glMultiDrawElementsIndirect. You can never underestimate how much work the driver actually ends up doing for you. It might very well be the case that specific implementations even go ahead and split your single multidraw call up into multiple batches because it estimates the GPU will execute them faster.

Quote from: ssawa on April 08, 2015, 05:16:57 am

Memory constraints are also obviously a big concern however for my particular needs in this project, which again may be strictly specific to me, my solution seemed to make stricter use of memory compared to the basic implementation.

Did you measure this and determine that this is a real problem? Do you have real-world numbers that speak for themselves? And I'm not talking about a single isolated primitive and its usage, I'm talking about your software/library put to use in a... real scenario.

Quote from: ssawa on April 08, 2015, 05:16:57 am

additional instances of VertexArray, which as a C++ class with several functions and members such as std vector, does come with its own overhead that could add up depending on how many calls need to be made

Functions don't take up memory, and sf::VertexArray consumes, depending on system, somewhere between 12-32 bytes each. Even with 1000000 of them, that's 12-32 MB memory "overhead" which should be barely noticeable compared to the actual data. Sending an sf::VertexArray off to sf::RenderTarget to be rendered does take time, but if your batches aren't too small (which should usually be the case in a typical scenario when even considering using sf::VertexArray), then this time is amortized by the rest of the time spent doing the actual work.

Did you try benchmarking/profiling/analyzing memory usage with and without your optimization? Because if the difference is less than 5% on average in a non-synthetic scenario, you should really consider whether the time you put into it was actually worth it.

Author Topic: Support drawing multiple arrays of same OpenGl mode within one call (Read 4992 times)

ssawa

Support drawing multiple arrays of same OpenGl mode within one call

eXpl0it3r

Re: Support drawing multiple arrays of same OpenGl mode within one call

ssawa

Re: Support drawing multiple arrays of same OpenGl mode within one call

Nexus

Re: Support drawing multiple arrays of same OpenGl mode within one call

binary1248

Re: Support drawing multiple arrays of same OpenGl mode within one call

ssawa

Re: Support drawing multiple arrays of same OpenGl mode within one call

ssawa

Re: Support drawing multiple arrays of same OpenGl mode within one call

binary1248

Re: Support drawing multiple arrays of same OpenGl mode within one call