Major UpdateVertex Array ObjectsIn an attempt to reduce the OpenGL API call overhead, I implemented a way for RenderTargets to batch attribute bindings into vertex array objects when they are supported and the requirements are set. Due to the way SFML goes about with its contexts, I couldn't make it into its own class because VAOs need to be bound to a context due to their inability to be shared between contexts. This means that the RenderTarget has to manage creation and destruction of VAOs on its own using a dynamic aging system, which was also a bit annoying to implement. Right now, each RenderTarget (and their context) has the ability to track up to 10000 VAOs meaning you can draw 10000 distinct drawables per frame before they will be continuously created and destroyed which will likely kill performance. Hopefully 10000 distinct drawables is enough
. This is a limitation of SFML's context architecture and might change in the future.
Geometry ShadersAdded support to load in geometry shaders. Geometry shaders are optional so Shader objects can still be used without them. In order to keep the API clean, I chose to support only the core specification of geometry shaders as it doesn't add any new GL functions because all the needed information is encapsulated in the shader source.
Uniform Buffer ObjectsBuffer objects (sf::VertexBuffer) can now be used as storage for uniform data in shaders. sf::VertexBuffer already had the option of storing generic (non-vertex) data previously and if you bind it to a uniform block in a shader, you can upload truly huge chunks of data in a single call just by modifying the contents of the buffer and rebinding it. Access to the data is as simple as accessing any other array type in the shader.
Light OptimizationsI moved the light component colour computations out of the shader and into the Light class so it only has to be computed once and can be used for every fragment without taxing the shader needlessly. The attenuation values are also packed into a vec4 now so that the shader compiler has an easier time vectorizing the attenuation computations that are performed inside the shader, and the number of glUniform calls using the non-buffer method is reduced.
General OptimizationsThere are also many other optimizations that I implemented that reduce the amount of redundant operations that are performed as well as the number of OpenGL calls per frame. The vanilla shader API wasn't designed for frequent (multiple times per frame) setting of parameters and as one can see below, the call count exploded with the initial implementation.
For comparison, here are the function call lists running the 3d example
per frame using CodeXL:
Legacy rendering using minimally modified SFML:
Initial non-legacy rendering:
Optimized non-legacy rendering:
It might look like the legacy rendering is still advantageous over the optimized non-legacy rendering, however keep in mind that glDrawArrays using client-side arrays was transferring huge amounts of data every frame along with having to reset the attribute source in every draw call which might lead to pipeline stalls on the GPU in some cases. In addition to that, the implicit data transfers that take place as part of the fixed-function behaviour would also have an effect but not show up on the list. The performance difference might not be noticeable in this scenario, however, the larger the amount of geometry gets and the more data the GPU has to process each frame, the more the new implementation will gain over the others.
I think the ARB was right when they said that the non-legacy API is designed to be much cleaner
. This can easily be seen by the reduction of the number of function types that are called from legacy to non-legacy.
The changes have all been pushed to GitHub.