Author Topic: New strategy for the internal rendering code in SFML 2 (Read 9349 times)

Laurent · « **on:** January 19, 2010, 09:46:04 pm »

Hi

The automatic batching system was great, but after using it for a while and collecting feedbacks, I realized that it was creating new problems that were very tricky to solve.

I've totally rewritten this part, and there's no batching and delayed rendering anymore. There's now just a render states cache, which gives even better performances than before.

The only drawback is that user code that uses OpenGL directly must explicitely take care of the current states, with the new SaveGLStates() and RestoreGLStates() functions. By giving explicit control to the user, I'm able to perform many optimizations in the internal code. You can have a look at the OpenGL sample to see how to use these new functions.

I'm waiting for your feedback, about both performances and bugs

OniLinkPlus · « **Reply #1 on:** January 20, 2010, 02:00:03 am »

Could you provide a benchmark of pre-batching, batching, and the current technique so we can see the optimizations?
Also, what effect does this has on the size of the graphics library? I have an unnatural hatred towards sizes over 1MB. :S

Laurent · « **Reply #2 on:** January 20, 2010, 11:46:47 am »

Quote

Could you provide a benchmark of pre-batching, batching, and the current technique so we can see the optimizations?

That's exactly what I don't want

I'll get much more interesting results if users test it on their own games/programs, which represent a wide range of different situations.

Quote

Also, what effect does this has on the size of the graphics library? I have an unnatural hatred towards sizes over 1MB. :S

The code is smaller (I removed a lot of classes), I don't really know what impact it has on the size of the library.

Laurent · « **Reply #3 on:** January 20, 2010, 11:29:28 pm »

Well, the new SaveGLStates / RestoreGLStates functions are not harder to use than any other SFML function.

Boogiwoogie · « **Reply #4 on:** January 21, 2010, 05:41:43 pm »

Quote from: "Laurent"

I've totally rewritten this part, and there's no batching and delayed rendering anymore. There's now just a render states cache, which gives even better performances than before.

wow, i didnt't expect this to hear for today. so this is the end of the batch rendering show? will there ever be any kind of va/dl/vbo support on sfml drawing functions?
boogi

Laurent · « **Reply #5 on:** January 21, 2010, 06:04:27 pm »

Quote

will there ever be any kind of va/dl/vbo support on sfml drawing functions?

If I find something faster than the current IM-based implementation, yes. But so far, everything else I tried was slower.

The problem is that VA/VBO are good for static/compiled/batched geometry, not for a sequence of separate quads.

However, the main problem was the huge number of driver calls, and the latest modification removed enough of them to get very good performances.

Boogiwoogie · « **Reply #6 on:** January 23, 2010, 06:23:16 pm »

i checked out the sfml2 from svn and i am wondering whether to use it for a new project or to stick to 1.5? when will there be some "feature freeze" or so that locks the api to what it is now?

Laurent · « **Reply #7 on:** January 23, 2010, 06:35:20 pm »

Quote

when will there be some "feature freeze" or so that locks the api to what it is now?

If that's important for you then you should definitely use SFML 1.5 (soon 1.6). SFML 2 is still a big WIP

Boogiwoogie · « **Reply #8 on:** January 23, 2010, 06:42:40 pm »

and 1.6 will be a bug fix release, right? a couple of weeks ago i read some thread where you wrote about text rendering in sfml2 and someone agreed that it has improved quite a bit. i actually dont wanna miss that out

Laurent · « **Reply #9 on:** January 23, 2010, 06:44:45 pm »

Quote

and 1.6 will be a bug fix release, right?

Absolutely.

nullsquared · « **Reply #10 on:** January 24, 2010, 06:57:05 pm »

... how is this any different from what I was talking about before? You went out of your way to prove me wrong, and now you go back to IM? :roll:

Anyways, why are you transforming the vertices on the CPU? I have a hard time believing that's faster than just using glLoadMatrix() for what it's meant for :roll: ... For example, you can load the View matrix as the projection matrix and every object's world matrix as the modelview matrix, and tada, free GPU matrix transformations.

That, and I still believe that using a VBO per-object would be faster. I remember reading somewhere that OpenGL draw calls weren't slow like D3D's DrawPrimitive()

Laurent · « **Reply #11 on:** January 24, 2010, 07:57:43 pm »

Quote

... how is this any different from what I was talking about before? You went out of your way to prove me wrong, and now you go back to IM?

IM vs VBO is not my main concern, it's not really hard to switch between them to find the most efficient. My whole point is about reducing the states changes, which is the only thing that's killing performances in SFML. I first believed that batching would be the only solution, but I recently found that it was not necessary.

Quote

Anyways, why are you transforming the vertices on the CPU? I have a hard time believing that's faster than just using glLoadMatrix() for what it's meant for Rolling Eyes ... For example, you can load the View matrix as the projection matrix and every object's world matrix as the modelview matrix, and tada, free GPU matrix transformations.

It's actually a huge improvement to transform vertices on the CPU. Changing the modelview matrix for every quad is just insane, whereas transforming it on the CPU only requires a few * and +. Again, don't forget that we are not in the usual situation where you have big static buffers of geometry, rendering individual quads with their own states is a lot different.

Quote

That, and I still believe that using a VBO per-object would be faster

Not in this particular case.

Well, I'm not saying that I know everything about OpenGL, I just do a lot of tests and choose the better solution. Simple. So please before debating again and again, write some code and prove me that you have something better.

Boogiwoogie · « **Reply #12 on:** January 25, 2010, 01:19:19 pm »

Quote from: "Laurent"

So please before debating again and again, write some code and prove me that you have something better.

Can you give some specs about a benchmark that would convince you? I think drawing a background of 32x32 pixel tiles on a 1920x1080 resolution wouldn't convince you despite of outperforming the IM, because its not the "usual case". So what should a Move-Laurent-Back-To-Batching-Benchmark look like?

Laurent · « **Reply #13 on:** January 25, 2010, 02:01:25 pm »

Quote

Can you give some specs about a benchmark that would convince you?

Any benchmark can convince me. I personally test a huge number of sprites, texts and shape with several configurations (changing the number of different images/fonts, size, etc).

Quote

I think drawing a background of 32x32 pixel tiles on a 1920x1080 resolution wouldn't convince you despite of outperforming the IM, because its not the "usual case"

That's the test that I use the most.
I'll be glad to see something outperforming IM for this situation

Of course you can't change the public API to create some big-static-precompiled-optimized geometry class/interface... I know that this kind of class would outperform the current implementation, but it doesn't fit the current public API. However, if the latest optimizations are not enough, I'll think about adding such features to SFML, but first I'd like to see if anyone really needs it. That's the purpose of this post, I really need many people to test the new implementation with many different situations and configurations.

Quote

So what should a Move-Laurent-Back-To-Batching-Benchmark look like?

So you would like me to revert to batching? What are your reasons? The new implementation is faster, simpler and removes a lot of bugs.

Boogiwoogie · « **Reply #14 on:** January 25, 2010, 02:54:53 pm »

Quote from: "Laurent"

So you would like me to revert to batching? What are your reasons? The new implementation is faster, simpler and removes a lot of bugs.

the reason is that i like your style. the library is really easy to use and this is what oop is all about - hiding away all the dirty stuff. and it would just feel right to have at least vertex arrays behind the scenes, because its an optimization that was introduced ages ago (wasnt it opengl 1.1?)
however, i fully agree with you that if it would make the use of the library more complicated, its not worth implementing it.
i cant tell yet what my solution would be for sfml, but as i can remember from former projects that vertex arrays boosted the frame rates quite a bit, even if i filled up the array buffers every frame. writing this, i feel like i should do some sfml opimization research...