Author Topic: New graphics API ready (Read 118454 times)

Laurent · « **Reply #120 on:** December 23, 2011, 11:23:40 am »

Quote

Wouldn't it be better to keep ints at the user interface and to convert them to floats internally?

I can't, vertex arrays are sent directly to the graphics card, in one single call where I give the pointer to the first vertex and the number of elements.

Quote

Floats in the API have the disadvantage that it's not immediately clear whether the coordinates are in the interval [0, size] or [0, 1] and whether it makes sense to use non-integral coordinates.

Yes... I wish I could keep integers but it doesn't seem realistic to ignore all those ATI cards.
But it makes sense sometimes to use decimal coordinates (I might need it in the future if I tweak pixel-perfect rendering again). Some people might even need to use normalized coordinates, if they use their own vertex shader. So in the end it's not as bad as it seems

Tank · « **Reply #121 on:** December 23, 2011, 08:44:33 pm »

One question regarding the new graphics stuff: You're already using vertex buffers, why aren't you using vertex buffer objects (VBO)?

Laurent · « **Reply #122 on:** December 24, 2011, 10:48:29 am »

VBOs would make the implementation more complicated, and performances worse in most cases.

Groogy · « **Reply #123 on:** December 24, 2011, 11:05:33 am »

Ow yeah wondering, will Textures support the SRGB format? Or is that feature to modern to be added?

I hate to work manually in those spaces so would love to miss that one out xD

Tank · « **Reply #124 on:** December 24, 2011, 11:12:41 am »

Quote from: "Laurent"

VBOs would make the implementation more complicated, and performances worse in most cases.

Why is it more complicated? You already have the VertexBuffer class with data that only has to be uploaded to VRAM, that's it. Instead of transferring all the data every rendering cycle, you'd do it once per manipulation and just call the buffer object when rendering. I don't think performance gets worse than client-side buffers.

Can you point out what exactly will be impacting performance and what's actually complicated? Can't really think of real show-stoppers.

Groogy · « **Reply #125 on:** December 24, 2011, 11:22:40 am »

VBO's are not gods solution to performance in graphics card. It depends on the situation when it is suitable to use.

VBO's will only improve performance on the CPU side by minimizing the amount of opengl calls. And even with that, if we check when Klaim was creating ICE3 he did test VBO's and found out that he in fact in some cases got worse performance.

Now VBO's main task is to improve performance when we have a lot of data to be sent over many times to the GPU. Does SFML have that? No. I have already suggested things like this to let SFML support Instancing but the answer is no because SFML does not have a lot of vertexes to send over(common case is 4 so performance would certainly decline) and the real improvement in performance would be when we reaches somewhere around ~5 000-10 000 draw calls which would be insane to do in a 2D application. And we'll be long by then be GPU bound instead of CPU bound.

http://stackoverflow.com/questions/430555/when-are-vbos-faster-than-simple-opengl-primitives-glbegin

Here you can see that the developer who are trying to measure but get's similar results are GPU bound, no matter what he does on the CPU side his frame rate stays the same. Now I don't know if that guy did it correctly or maybe he didn't use static VBO or something like that which we could utilize. But if Laurent says that he would get worse performance, I would understand why ^^

Found this little document which might be interesting if you like this sort of thing: http://developer.amd.com/media/gpu_assets/PerformanceTuning.pdf

Laurent · « **Reply #126 on:** December 24, 2011, 11:48:13 am »

Quote

Ow yeah wondering, will Textures support the SRGB format?

Before asking for a specific format, you should first ask if SFML will ever provide support for multiple formats

Quote

Why is it more complicated? You already have the VertexBuffer class with data that only has to be uploaded to VRAM, that's it. Instead of transferring all the data every rendering cycle, you'd do it once per manipulation and just call the buffer object when rendering. I don't think performance gets worse than client-side buffers.

Can you point out what exactly will be impacting performance and what's actually complicated? Can't really think of real show-stoppers.

VBO data are stored in the graphics card memory. SFML needs it to be in system memory, because it may change frequently before being rendered (SFML is not high-level enough to ensure an optimal vertex data flow), and most important, it requires random read access.
With vertex arrays I'm already slower than immediate mode when drawing small vertex arrays (sprites), VBOs would just make things unusable.
Most people think that VBOs are always better than anything else, in any situation. But that's far from being true, things are more complicated.

Tank · « **Reply #127 on:** December 24, 2011, 12:35:53 pm »

I'm not claiming that VBOs will solve every FPS problem, but indeed there are a lot cases where you at least want to have the possibility to optimize.

A good example is SFGUI. By throwing in some classes for pure optimizations we were able to increase FPS by 500-1000% (depending on the system). It was GPU-bound before, and after we've done some simple(!) optimizations one of the slowest operations was dereferencing smart pointers, just to give you an idea (i.e. CPU-bound after).

We implemented display lists in which Draw() calls got compiled. Besides of that also culling has been applied, which again decreased frame time (this is not a general case however, depends on the application).

Quote

because it may change frequently before being rendered

Theoretically, yes. But seriously, when do you need to modify shapes after creating them? Sprites for example don't need to be changed at all except texture coordinates. In most cases you prepare your geometry and render it. And this isn't limited to 3D applications.

If SFML could give the choice of enabling buffer objects I assume that would indeed be used in many cases. It could be done similar to splitting up sf::Image into sf::Image and sf::Texture. For fixed geometry (which is the majority in my opinion) put the stuff into a buffer object, done. For dynamic geometry, either upload the buffer every rendering cycle OR update the buffer object when it actually got modified.

Laurent · « **Reply #128 on:** December 24, 2011, 02:41:03 pm »

But you forget one very important thing: what needs to be optimized in SFML is the drawing of many small entities (where vertex arrays and VBOs are not good at all). Drawing one big vertex array is lightning fast, there's no need to optimize this use case at all.

Most use cases that were too slow with the previous API will be ok with the new one, just by grouping similar primitives into vertex arrays. I'm not worried at all about that. What causes me troubles are sprites and shapes, and I probably won't get them faster without some kind of batching.

Before trying to optimize SFML, one needs to understand how it works, and where the bottlenecks are. I'm open to all ideas, but if they are not applied to SFML's specific uses cases they won't help much

luiscubal · « **Reply #129 on:** December 24, 2011, 03:01:33 pm »

Personally, I'm fine with whatever Laurent picks, provided that:

1. It's fast enough
2. It's simple enough to learn and use
3. It's widely supported
4. Does not rely on deprecated OpenGL functionality (e.g. features not available in Core Profile and OpenGL ES 2.0)

And option 4 does not even have to happen immediately for SFML 2.0 - it can happen gradually over the next few versions.

Tank · « **Reply #130 on:** December 24, 2011, 04:40:26 pm »

Quote

Before trying to optimize SFML, one needs to understand how it works, and where the bottlenecks are. I'm open to all ideas, but if they are not applied to SFML's specific uses cases they won't help much

We'll see how it works out especially with SFGUI and the new API. I'll try to port it as fast as possible, but I'm still convinced there'll be much more potential (we're also NOT using VBOs, because it doesn't make sense; just to show I'm no "VBO cures everything" guy). Sure, when the FPS counter says "1000", it's fast, but if it can show "1500", it's even faster. You once said such high FPS are not important; I think they are, it shows potential and places where things can be optimized. Hopefully I'm wrong.

Quote

1. It's fast enough

Compared to what?

Like said above, if something's running at 800 FPS and your vsync rate is 60 Hz, then yes, you may think it's fast. But if you can get even more which also means you can spend time (which is rare) on other things, wouldn't it be worth it?

Quote

2. It's simple enough to learn and use

I agree to that, however simple shouldn't mean that everything must be obvious at first glance. "RTFM" is always required.

Laurent · « **Reply #131 on:** December 24, 2011, 05:07:42 pm »

Quote

We'll see how it works out especially with SFGUI and the new API

That's a very interesting use case, let me know when it's done

Quote

Sure, when the FPS counter says "1000", it's fast, but if it can show "1500", it's even faster. You once said such high FPS are not important; I think they are, it shows potential and places where things can be optimized.

I have a slightly different point of view. If you can optimize further without making the code more complicated, and without touching the public API, then yes, optimize. But if it requires complicated tricks that make the code less maintainable, or adding dedicated functions to the public API, then is it worth the extra FPS?
2D is not as intensive as 3D, it doesn't require a lot of work to get very good performances. So, my plan is to see if everyone is happy with the new API, and optimize it only if it is required, after gathering enough feedback and use cases.

Groogy · « **Reply #132 on:** December 24, 2011, 10:39:10 pm »

If you really want to go there and achieve most possible FPS you would like to implement pseudo-instancing(since SFML will be limited to OpenGL 2.0 I think it was, else we can do real) which will fit perfect for objects that are repeated a lot with small to no changes between draw calls(Sprites). The problem would be like Laurent says he wants to avoid, making the code less maintainable. Everything will have to be more or less constructed around that as it's core.

But this would require batching internally by SFML in order to keep the code simple enough. But like said, you won't be able to get better FPS than that. And Laurent has already put his foot down there and said no

Not going to argue with that.

Also, Merry Christmas!
Swede's celebrate it on the 24th

Tank · « **Reply #133 on:** December 25, 2011, 02:42:42 am »

Quote

That's a very interesting use case, let me know when it's done

Will do.

Quote

I have a slightly different point of view. If you can optimize further without making the code more complicated, and without touching the public API, then yes, optimize.

I also like "Don't do premature optimization" and "Design over performance", but if it's possible without "tricks", one should do it. We're not talking about dirty hacks here, instead we're talking about established and well-known and -used OpenGL features, not hacks.

Quote

But if it requires complicated tricks that make the code less maintainable, or adding dedicated functions to the public API, then is it worth the extra FPS?

I'm absolutely sure those optimizations are everything but complicated. I already mentioned it: By using a simple OpenGL display list we were able to boost FPS a lot. a) It's not complicated. b) It can live next to the regular public API, i.e. the user is not forced "to use the optimizations".

Quote

2D is not as intensive as 3D, it doesn't require a lot of work to get very good performances.

2D can be very intensive, too. Simple example: Imagine you've got a tilemap, 16*16 pixels for each tile. For a resolution of 1024*768, you've got 64*48 tiles, i.e. 3,072 quads = 12,288 vertices. With SFML you HAVE TO send those 12,288 vertices through the bus EVERY rendering cycle. It will still be "fast" if you compare FPS to a minimum value, but indeed you could go A LOT faster, having more resources for other stuff (logics, physics, etc.). And vertex arrays don't help you out here, too: You still have to send the array through the bus. Buffer objects however can save a lot of time (did it myself, HUGE benefit).

Again, we do it like that in SFGUI: In the beginning we had ~500 FPS with a good amount of widgets. And you could have stopped by saying "500 is good!". Now there're 1200+ FPS. That makes a bunch of extra frametime available to the developer using SFGUI, thus it's getting not as much into his way as before, which is good (especially for libraries!).

Quote

So, my plan is to see if everyone is happy with the new API, and optimize it only if it is required, after gathering enough feedback and use cases.

I really did it like you with SFGUI. Then binary1248 joined the party and optimized the hell out of it; seriously, I'm thankful for that: It showed what's possible with rather simple code.

In my humble opinion one should always try to get the best results than just making the "average user" just happy.

Quote

If you really want to go there and achieve most possible FPS you would like to implement pseudo-instancing(since SFML will be limited to OpenGL 2.0 I think it was, else we can do real)

Instancing is another side of the medal. However, saying SFML is limited to OpenGL 2.0 is wrong. It's not limited to anything. You can add fallbacks. Hardware supporting instancing can use it, the others will use fallbacks. It's done like that in so many games/applications..

Quote

The problem would be like Laurent says he wants to avoid, making the code less maintainable. Everything will have to be more or less constructed around that as it's core.

The real problem is that you don't have the choice really. In SFGUI we already had to work-around SFML's rendering pipeline "a bit" to make some optimizations possible. And I don'T see a clear reason why adding conditional features to a library makes it less maintainable. If those things were hacks, then yes, I fully agree. But we don't have hacks, we have normal features. You could also argue that adding vertex buffers is unmaintanable because you have OpenGL immediate mode, it's useless.

Quote

But this would require batching internally by SFML in order to keep the code simple enough. But like said, you won't be able to get better FPS than that. And Laurent has already put his foot down there and said no Not going to argue with that.

Well, why has everything to be "simple"? If you don't need optimizations and "pro" features, don't use them. If someone doesn't need batching, because his Pacman game has great performance, everything's alright. But why don't allow the user to have more control of the whole pipeline? I think this is perfectly possible, without striping out the "S" out of "SFML". Simple is relative, too.

And just because Laurent says "No" it doesn't mean I'll say "Alright".

I think such discussions are welcome and everybody should be able to think about it and honestly admit to be wrong if that's the case. If there's nothing left in SFML to be optimized without raping the whole API, then I'll paddle back and officially state a "Sorry, you were right".

Quote

Also, Merry Christmas!
Swede's celebrate it on the 24th

Merry Christmas to you guys too. Same in Germany.

Laurent · « **Reply #134 on:** December 25, 2011, 10:36:07 am »

Quote

I'm absolutely sure those optimizations are everything but complicated. I already mentioned it: By using a simple OpenGL display list we were able to boost FPS a lot. a) It's not complicated. b) It can live next to the regular public API, i.e. the user is not forced "to use the optimizations".

VBOs are not complicated, but using them efficiently in SFML can be tricky. For example, creating a 4-vertex VBO in each sprite would be very innefficient. It would also make sprites heavier than they are supposed to be (each copy would create a duplicate VBO on the graphics card). And how do you provide read access to the vertices? You have to add lock/unlock functions to every class in which you want to do it.

Quote

2D can be very intensive, too. Simple example: Imagine you've got a tilemap, 16*16 pixels for each tile. For a resolution of 1024*768, you've got 64*48 tiles, i.e. 3,072 quads = 12,288 vertices. With SFML you HAVE TO send those 12,288 vertices through the bus EVERY rendering cycle. It will still be "fast" if you compare FPS to a minimum value, but indeed you could go A LOT faster, having more resources for other stuff (logics, physics, etc.). And vertex arrays don't help you out here, too: You still have to send the array through the bus. Buffer objects however can save a lot of time (did it myself, HUGE benefit).

I get 500 FPS with a 100x100 tilemap (40000 vertices). I would be glad to get 1000 FPS, yes, but I don't think that it can be done without making the API ugly.

Quote

Again, we do it like that in SFGUI: In the beginning we had ~500 FPS with a good amount of widgets. And you could have stopped by saying "500 is good!". Now there're 1200+ FPS. That makes a bunch of extra frametime available to the developer using SFGUI, thus it's getting not as much into his way as before, which is good (especially for libraries!).

It's not even a millisecond. Are you sure that the benefit would be the same at 50 FPS (x2.5 -> 120 FPS), and that it's not a fixed improvement (+0.5 ms -> ~52 FPS)? Like I already told you before, you should do your benchmarks with more things to draw and less FPS.

Quote

I really did it like you with SFGUI. Then binary1248 joined the party and optimized the hell out of it; seriously, I'm thankful for that: It showed what's possible with rather simple code.

If someone sends me a patch that improves the overall performances of SFML without making neither the internal code nor the public API ugly, I'll be happy too

Quote

And just because Laurent says "No" it doesn't mean I'll say "Alright". I think such discussions are welcome and everybody should be able to think about it and honestly admit to be wrong if that's the case.

Yes. Please don't stop discussing just because I said "no", I'm sometimes (often?) wrong