Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: SFGUI (0.4.0 released)  (Read 391537 times)

0 Members and 5 Guests are viewing this topic.

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #255 on: January 28, 2012, 07:17:19 pm »
Quote from: "Laurent"
I was saying that using such high numbers were irrelevant to show optimization results.


Well regarding optimization results, the more the better right? So if you have a library that runs at 2000 FPS instead of 1000 FPS it is sure to run faster on a slower system, say from 100 FPS to 200 FPS? Until you are sure that your GPU and CPU are both fully loaded you can always optimize no matter what the FPS value is. Optimization is not about maximizing the absolute FPS you get on one system, but rather eliminating the bottlenecks on all systems at the same time to ensure it will run faster (meaning more FPS regardless of the system) on all systems instead of at a given FPS value on one given system.

Therefore high numbers are indeed relevant, they show that a certain optimization does have a positive effect. And that effect will carry over to all systems, not only my own. Not everybody has a modern (in the last 2 years) GPU, and Tank's +2400 FPS might translate to e.g. +100 FPS from 50 FPS for them, which is very desirable.

You have to open up for a wider range of hardware if you really want to support OpenGL ES ;)
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
SFGUI
« Reply #256 on: January 28, 2012, 07:38:28 pm »
You don't get it. Let me explain again what I mean.

At 3600 FPS, one frame takes less than 300 microseconds to render. At such low durations, anything takes a significant part of the result, even event/window handling -- and basically, everything that happens only once per frame and that you don't want to see in the result. Even running a music player in the background could make a difference.

In one frame you draw many widgets, but you also do a lot of small things that are irrelevant to what you want to optimize. If you want to focus on the widgets themselves, you must draw many more of them so that the rest becomes really negligible. In my opinion, you must be below 100 FPS if you want to be credible.

Or... you should just say "overall performance improvement of ~200%" and not give too many details ;)
Laurent Gomila - SFML developer

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #257 on: January 28, 2012, 08:48:22 pm »
Yeah, I know that anything running on my computer could skew the results. I even actually get much lower FPS when running other CPU or GPU intensive applications at the same time as testing. But when I test the real performance of the library I make sure that my test environment is as clean as possible a.k.a CPU and GPU load < 5% which is a decent margin of error.

Browsing the web or using "typical" everyday applications implies the same window management / event handling as anything else running on my computer and that doesn't even load my CPU past 5% which means the same OS overhead would apply to an SFML application too. Which leaves the other 95% of usage purely to the "essential" part of the app (drawing and what not).

We could just shove 1000 buttons into our ScrolledWindow and turn off culling to purposely get the FPS down to 100 FPS if that's what it takes to get reliable test results. But why do so if we can see the difference at much higher FPS values?

Being a hobby physicist I just have to throw in an example:

Consider you want to measure the speed of light. And you would do so by measuring the duration it takes for a laser pulse to travel a certain distance.

You could measure it by getting a 10KM long fiber optic cable and sending a pulse through it and measuring the time it takes through that. Or you could get a 10M long fiber optic cable and measure the time it takes to travel through that with a high precision measuring device. You would resort to the first method because it would seem less susceptible to interference and margin of error of the time measuring device. But because we made sure the test environment was clean and the same in both cases we can also resort to the second method.

The key when testing is making sure you can reproduce your results, which in turn means the environment is fully understood and taken into account. The FPS values stated here are reproducible between system restarts and the relative FPS gain among the testers is also consistent which means that the FPS values themselves are in fact a reliable method of measuring performance, even at such high values.

Also worth reading: Amdahl's law
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Tank

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1486
    • View Profile
    • Blog
    • Email
SFGUI
« Reply #258 on: January 28, 2012, 09:25:16 pm »
Quote
So you had time to test the new API before switching to OpenGL. Was it better (performances and usage) than the old one? I'm interested to see how it performs for GUI systems

Yep, we also took the time to see how it performs. It was definitely better performance. Before the new graphics API the highest FPS we could get was ~400 (SFGUI test application), with our custom culling and using display lists we could get it up to ~1,600 FPS.

The new graphics API put out 1,200 FPS, without custom culling and without display lists. So it was still slower than the old API together with our optimizations, but faster without them.

Quote
By the way, since a GUI library needs to draw many small entities, it's very close to what I need to optimize in SFML. So if you feel like there are optimizations that I could apply to SFML, don't hesitate to share.

Basically: VBO. The whole GUI is stored in one single VBO, together with a texture atlas for one single texture (this may change in the future as there're limits regarding the maximum texture dimensions). Then there're a lot of matrix operations and other OGL calls that we can save because we're only calling what's indeed needed.

I think (binary1248 can give better explanations as he did the renderer) the biggest benefit is saving the bus (to GPU) the trouble by avoiding sending buffers (vertices, texture coordinates and colors) every frame.

The optimizations are actually shared; SFGUI is open source. ;) It's quite easy to see how the renderer works (check Renderer.cpp and Primitive.cpp at first, most important files regarding the rendering).

Elgan

  • Jr. Member
  • **
  • Posts: 77
    • AOL Instant Messenger - Flat+1,+17+st+Cl
    • View Profile
SFGUI
« Reply #259 on: January 28, 2012, 09:31:18 pm »
This is very fun reading..there is software which will measure performances.


maybe it would be fun to make a test benchmark thingie for SFML aps of sorts...not sure how it would work right now..

fps, and external measuring memory and ..hm whatever else.

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #260 on: January 28, 2012, 10:12:12 pm »
Good idea... Laurent can decide what he deems worthy to benchmark in every version of SFML ^^. Then just write a spec and that will be the standard of testing. Such areas could be e.g. text rendering, sprite rendering, shape rendering etc.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
SFGUI
« Reply #261 on: January 28, 2012, 11:53:49 pm »
Quote
The key when testing is making sure you can reproduce your results, which in turn means the environment is fully understood and taken into account.

I'm pretty sure that you don't fully understand how SFML can impact your performances. For example, there's a bug in event handling on Windows that will slow down some applications randomly. However it happens once per frame so if your test application is really loaded it will hardly make a difference on the final result.

I'm not saying that your tests are flawed, I'm even sure that they are strictly executed and interpreted. But that's not the most efficient way of testing things, and some other people might not trust your results.

Quote
It was definitely better performance. Before the new graphics API the highest FPS we could get was ~400 (SFGUI test application), with our custom culling and using display lists we could get it up to ~1,600 FPS.

The new graphics API put out 1,200 FPS, without custom culling and without display lists. So it was still slower than the old API together with our optimizations, but faster without them.

Thanks for the feedback.

Quote
Basically: VBO.

I was afraid you would say that :D
A GUI is a static thing, so I guess that VBO are perfect. In SFML things are more complicated, I cannot assume any particular usage. For example, many people use a single dynamic sprite to draw everything in their game. I must design and implement things as if every property of every entity could change every frame.
At least, with the new API, people who know a little about graphics programming can write efficient code with SFML. They're no longer stuck with slow sprites.

Quote
The optimizations are actually shared; SFGUI is open source.

True :)
Laurent Gomila - SFML developer

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #262 on: January 29, 2012, 12:41:00 am »
Quote
For example, there's a bug in event handling on Windows that will slow down some applications randomly.


Interesting... another bug I didn't know about.

Quote
I must design and implement things as if every property of every entity could change every frame.


That's exactly what the GL_STREAM_DRAW usage hint was designed for. It tells the GPU that it can expect buffer data to change between every draw call (even multiple times). The difference is that with a VBO which you update completely every frame the data is in a single buffer on the card. Think of it like calling new int; 1000 times and calling new int[1000];. The second variant would probably complete faster for the exact same reasons. And if you "prepare" the data to be as GPU friendly as possible, it will reward you appropriately.

"Prepare" would mean things like:
    1. Converting your geometry data to draw using one primitive type (internally I'm sure GPUs convert almost all primitives to triangles anyway).

    2. Reducing state changes (less texture binds, etc.) by batching and reordering the draws on CPU without influencing the final outcome of the frame.

    3. Reducing the "useless" processing on the GPU when you know that what it does will have no effect on the frame output (this is more of a problem on older hardware without unified shaders).

    4. Trying to make the GPU/Driver go through optimized paths when it can make some assumptions about the data being drawn (e.g. using identity matrices probably skips the whole matrix multiplication step altogether)

What one needs to look for are values which the GPU probably has to calculate every frame but stay exactly the same. They can be calculated when needed on the CPU and passed to the GPU "prepared" so it can save a lot of effort putting those pixels on the screen.

If you want an extreme (and still quite buggy) example of how I prepare data for the GPU, have a look at the texture preblending we do in the new Renderer. It offloads the blending from the GPU to the CPU under the assumption that the blended pixel values stay the same over all frames. I tried it out because 1. GPUPerfStudio was telling me that the GPU was stalling on buffer operations and 2. because I was crazy enough and had too much time. It seemed to harvest more performance and can work under the right circumstances.

The key is really to make a library so intelligent it knows how it can optimize the users data by itself in every situation.

For those who are curious, during the implementation of the new renderer I used gDebugger, GPUPerfStudio, valgrind and of course faithful gprof.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
SFGUI
« Reply #263 on: January 29, 2012, 09:38:02 am »
Quote
That's exactly what the GL_STREAM_DRAW usage hint was designed for

My tests showed that locking/updating/unlocking a GL_STREAM_DRAW VBO is still slower than a vertex array, which is already slower than immediate mode in such a context.

Quote
The difference is that with a VBO which you update completely every frame the data is in a single buffer on the card. Think of it like calling new int; 1000 times and calling new int[1000];

Does it mean that you create one single big VBO and then "allocate" your widgets' geometries inside it with a custom algorithm? I never succeeded to write such an implementation, because writing an efficient allocator is really complex.

Quote
Converting your geometry data to draw using one primitive type

SFML is not high-level enough, it explicitely allows one to choose its primitive type.

Quote
Reducing state changes (less texture binds, etc.) by batching and reordering the draws on CPU without influencing the final outcome of the frame.

Again, SFML is not high-level enough, it must do immediate drawing so no batching is possible. I optimize state changes and pre-transform small entities on the GPU but I feel like this is the maximum that I can do.

With a GUI, the order is defined by the parent-child relationship, you can easily have a scenegraph behind the scenes and benefit from all the nice optimizations that such a data structure allows. This is hardly applicable to SFML.

Quote
The key is really to make a library so intelligent it knows how it can optimize the users data by itself in every situation.

Is it your feeling about SFML too, knowing that it provides low-level primitives and doesn't know what the user will do with them?
Laurent Gomila - SFML developer

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
SFGUI
« Reply #264 on: January 29, 2012, 10:56:38 am »
I've had a look at Renderer.cpp, and now I understand your rendering strategy (you can ignore my related question above). It is definitely not applicable to SFML because I can't batch everything and delay all the rendering until the end of the frame.

I've seen some really nice ugly hacks and the even nicer comments associated to them about SFML. If you want to talk about these issues I'm here ;)
Laurent Gomila - SFML developer

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #265 on: January 29, 2012, 01:09:34 pm »
Like I said, a library has to recognize opportunities to optimize and do it the best it can. Of course you can't optimize in exactly the same way for every single use case there is.

For example some people might not make use of VertexArrays or custom primitive types for whatever reason. Then you can assume that whatever he draws every frame, sprites, text, etc. can be broken down into triangles.

You could also for example batch text draws together. Say the user draws multiple sf::Texts after each other (very common from what I've seen) with the same sf::Font (face and size same), you can also batch those together. It saves you from stopping to check whats next to draw only to find out that it's exactly the same kind of data that you previously drew and even using the same texture.

Quote
My tests showed that locking/updating/unlocking a GL_STREAM_DRAW VBO is still slower than a vertex array, which is already slower than immediate mode in such a context.
...
It is definitely not applicable to SFML because I can't batch everything and delay all the rendering until the end of the frame.


Well correct me if I'm wrong, but the user won't see anything on the screen until he calls Display on his window anyway. So whether the drawing takes place right where he calls it or is saved and performed in the same order right before the buffer is swapped, I don't see the difference. The big one though is that you would transfer your data in bigger chunks which is where VBOs start to shine. As long as data runs around host memory or GPU memory it stays fast. When it has to run across the PCIe bus, and that too many times per frame, it becomes the bottleneck which is what you are probably seeing in your comparison between Vertex Arrays and VBOs.

VBOs also don't perform too well if they are too small. So to use them properly you would have to serialize a lot of that primitive data together and draw multiple times using that 1 buffer. I also don't have to stress that VBOs don't have to be drawn completely in 1 pass. They can hold data for different primitive types. Heck they can hold completely different data sets together one after the other. Whatever you can draw using multiple Vertex Arrays you can draw using 1 VBO and multiple draw calls.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
SFGUI
« Reply #266 on: January 29, 2012, 07:02:28 pm »
Quote
You could also for example batch text draws together. Say the user draws multiple sf::Texts after each other (very common from what I've seen) with the same sf::Font (face and size same), you can also batch those together. It saves you from stopping to check whats next to draw only to find out that it's exactly the same kind of data that you previously drew and even using the same texture.

I already have a state cache, I only set the states that changed between two Draw calls.

Quote
Well correct me if I'm wrong, but the user won't see anything on the screen until he calls Display on his window anyway. So whether the drawing takes place right where he calls it or is saved and performed in the same order right before the buffer is swapped, I don't see the difference.

Good point. That reminded me of something, so I checked and found that two years ago I already tried to implement batching in SFML 2.
Here is what I said on 19/01/2010:
Quote
The automatic batching system was great, but after using it for a while and collecting feedbacks, I realized that it was creating new problems that were very tricky to solve.

Unfortunately I don't remember what these problems were.
Laurent Gomila - SFML developer

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #267 on: January 29, 2012, 09:00:44 pm »
Quote from: "Laurent"

I already have a state cache, I only set the states that changed between two Draw calls.


State changes aren't the only things you can save on although they make up a big piece of the time it takes to draw a frame. Draw calls are almost as expensive overhead-wise as state changes. Drawing 10 Sprites/Rectangle shapes for example requires vertex data for just 40 vertices but causes more than 40 OpenGL calls to be made.

Quote from: "Laurent"

Good point. That reminded me of something, so I checked and found that two years ago I already tried to implement batching in SFML 2.
Here is what I said on 19/01/2010:
Quote
The automatic batching system was great, but after using it for a while and collecting feedbacks, I realized that it was creating new problems that were very tricky to solve.

Unfortunately I don't remember what these problems were.


Well... you did change the drawing routines completely and don't use glBegin() glEnd() anymore. So maybe those problems won't carry over to the new drawable API. A link to that thread would be nice.
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).

Laurent

  • Administrator
  • Hero Member
  • *****
  • Posts: 32498
    • View Profile
    • SFML's website
    • Email
SFGUI
« Reply #268 on: January 29, 2012, 10:25:31 pm »
Quote
Well... you did change the drawing routines completely and don't use glBegin() glEnd() anymore. So maybe those problems won't carry over to the new drawable API.

This code had implementations for VBO, VA and IM.

Thread:
http://www.sfml-dev.org/forum/viewtopic.php?t=2063
(not very helpful because that's where I say that I removed the batching stuff)

More useful, the last revision where it was used:
https://github.com/SFML/SFML/tree/8ba9495c02f95dbff8aee44121a13f999234fb2f
Laurent Gomila - SFML developer

binary1248

  • SFML Team
  • Hero Member
  • *****
  • Posts: 1405
  • I am awesome.
    • View Profile
    • The server that really shouldn't be running
SFGUI
« Reply #269 on: January 30, 2012, 01:43:12 am »
From what I can tell reading through the source, it would have been bottlenecked by the CPU instead of the GPU. Thus whether you used VAs, VBOs or IMs it probably wouldn't make any significant difference. Your usage of the word "Batch" to describe the class containing the data for a single drawable is also kind of misleading. They weren't really batched data and so could not profit from batching at all.

Your idea of uploading data into a single buffer and drawing all at once at the end was good. HOWEVER, if you only draw the data one object at a time they will be, as you saw, hardly any better than VAs or even IM.

It would have probably made a big difference if you had stored more relevant data inside the Renderer object and let it manage drawing the objects itself when the time came. That way it would have been able to truly batch multiple objects together if it saw the possibility, saving not only a little GPU time but a massive amount of CPU time. Contrary to what people think most state changes and matrix ops take part on the CPU in the driver and the data gets sent in it's raw form to the GPU. Thus if the CPU is already busy going through all the batches every frame, the FPS will be hurt even more by redundant state changes which were abundant in that version.

Because you changed SFML a lot since then and cache states more effectively now and even use VAs as the primary drawing method, it would be nice to see how that old concept would fare in the current implementation.

And I'm curious, were these problems you speak of bugs/glitches or the flexibility/limitation kind of problems? I couldn't find any reports of problems related to the old drawing method while searching through those old threads.

Quote
I've seen some really nice ugly hacks and the even nicer comments associated to them about SFML. If you want to talk about these issues I'm here ;)


Wishlist (among other things to make SFGUI less "hacky"):
    1. Some way to identify/compare fonts among each other (name of the face or something).
    2. Some way to tell SFML to wipe it's vertex cache.
    3. Since you allow asking for a depth/stencil buffer, it would be nice to be able to clear those with RenderTarget::Clear() too instead of just the color. Now users have to resort to calling glClear() themselves which is horribly expensive.
    4*. A standard benchmark spec that encompasses all areas of SFML to use as a performance measurement tool while trying to optimize SFML. This is one of those crucial things that I and others would need to know to experiment with making SFML faster. In SFGUI we have our lovely test app (which as some might notice contains all widgets currently available).
SFGUI # SFNUL # GLS # Wyrm <- Why do I waste my time on such a useless project? Because I am awesome (first meaning).