Author Topic: Optimize memory layout of SFML classes (Read 7593 times)

Mr. X · « **on:** April 17, 2016, 10:23:31 pm »

There are several classes in SFML with a non-optimal memory layout, for example sf::Transformable or sf::RenderTarget::StatesCache. If in sf::Transformable members are reordered such that the boolean variables are next to each other, 4 Bytes can be saved (MSVC14, but there will be a size reduction with virtually every available compiler, although the size might vary). The reason is, that for optimization purposes compilers align class members, so that padding bytes are required between members of different size. The amount of padding bytes cannot be optimized automatically, since the compiler must place the members in memory in the same order as they are declared. What do you think about implementing this?

Laurent · « **Reply #1 on:** April 17, 2016, 10:48:41 pm »

I think nobody cares about saving 4 bytes per sf::Transformable instance, really... I personnally think that clean/consistent code is much more important than that. Admitedly, reordering members is not a big deal, but I'd prefer not starting caring about this kind of optimizations

But again, this is just my personal point of view.

eXpl0it3r · « **Reply #2 on:** April 17, 2016, 11:03:53 pm »

While it's quite a minimal optimization, I don't really have anything against reordering some members, as long as the readability prevails. And I wouldn't want it to be general rule, that would prevent us from iterating faster on code, because we'd first have to calculate whether the memory alignment makes perfectly sense.

I think we could do it once and do follows whenever it seems appropriate.

Mr. X · « **Reply #3 on:** April 23, 2016, 08:57:24 pm »

I have pushed this optimization to my fork now: https://github.com/PKEuS/SFML/commit/2a2e4e73cc04e53ccd27ebaeecd02472320ad5a8

Feel free to cherry-pick this commit (btw., there are some other commits on my fork that might be of interest for you, too).

DarkRoku12 · « **Reply #4 on:** April 24, 2016, 01:04:58 am »

IMO, worrying about alignment only does matter when dealing with intrinsics, placement new, and very low-level code that will run on a very limited system where each byte should be saved.

Kojay · « **Reply #5 on:** April 24, 2016, 03:15:03 am »

Your caches are your very limited system. Thus, good alignment can have a significant impact on performance regardless of the overall memory on your system.

There are two major factors:

i) The smaller objects are, the more can fit in cache
ii) At the same time, the first bytes of an object are its hot zone, where the most used members should be placed. (I cannot rigorously explain why. You can find a relevant SO question. Alexandrescu also mentions this hot zone several times .)

As to the topic at hand, whether it makes a meaningful difference for sf::Transformable can only be gleaned from measurements, though intuitively this may very well be the case when operating on a range of such. Since SFML is a library, such 'micro' optimizations are worth it.
The optimal alignment itself may vary across architectures; that said, getting rid of unnecessary padding is unlikely to be wrong.

binary1248 · « **Reply #6 on:** April 24, 2016, 04:38:21 am »

Quote from: Kojay on April 24, 2016, 03:15:03 am

At the same time, the first bytes of an object are its hot zone, where the most used members should be placed. (I cannot rigorously explain why. You can find a relevant SO question. Alexandrescu also mentions this hot zone several times .)

Theoretically, you can define the hot zone to be anywhere within the object's layout. What's important is that there can/should only be one hot zone. The processor/memory subsystem does not care nor understand what the data at specific addresses represent. Once you compile your code into machine instructions, it too no longer knows anything about "objects", there is no longer a "beginning of the object" or "end of the object". As such, making use of the fact that data gets loaded into cache in lines (fixed size blocks of memory) only really involves keeping your frequently used data together somewhere within the object and keeping the less used data as far away from it as possible.

Now... there is something else to consider when picking a nice spot for your hot zone. You aren't the only one that gets to pick what kind of data represents your object and where that data is stored. If you start to make use of polymorphic objects, the compiler will throw in some virtual pointers depending on your inheritance hierarchy. These virtual pointers could theoretically also be placed anywhere within the object, though if you think of it, because of what is allowed in the language, the only safe place to put them is at the start of the object.

Since having different standards for different situations (i.e. polymorphic classes and non-polymorphic classes) really sucks, people just got used to defining the first x bytes to be the hot zone of the object, since it works for everything.

This is also the reason why people normally think about devirtualization before playing around with the memory layout of their objects. You don't want those indirect memory accesses to all those different virtual tables screwing up your cache hit rate. I (and probably the processor as well) would rather have a 128 byte block of "unordered" PODs within an object than 1 or 2 virtual pointers.

The suggestion to go and reorder all of SFML's member variables won't provide much gain. Firstly, SFML's classes are relatively small if you compare them to some other libraries out there. Unless you are packing a lot of these objects into some really huge container, the few bytes saved here and there won't have any measurable impact on performance. Secondly, SFML does not make heavy use of polymorphism like many other C++ libraries do. Taking into consideration what I said above and the fact that SFML tends to have a rather flat class hierarchy, this means that the hot zone (if one even exists) does not have to be placed at the beginning of the class. Given that new members are usually appended to the end of the member variable list and the fact that the classes' core functionality was implemented before anything else, the hottest zone will probably be at the beginning of the object given the current layout. Reordering the members to reduce the object's total size might seem like an optimization, but diffusing the hot zone will overshadow the performance gains (if any).

Like anything else that can be considered a micro-optimization, there is no point in thinking of a solution to a problem that doesn't exist especially if it leads to a situation that is worse than it was previously. I've done these kinds of measurements on the libraries I used before (SFML, SFGUI, SFNUL, etc.). Like I said, SFML's objects are so small, that the cache miss rate is negligible. If you compare that to the size of SFGUI's widgets back before I optimized them (>200 bytes) then you will see that there is an order of magnitude difference. Optimizing objects in SFGUI that were that large actually led to less misses and a measurable performance gain (>10% if I remember correctly). I'm not trying to boast about SFML, but it is literally too good to benefit from this kind of optimization. The objects are simply too small. And considering the idea behind the library, I don't expect the objects to become big enough to warrant a reordering any time soon.

Author Topic: Optimize memory layout of SFML classes (Read 7593 times)

Mr. X

Optimize memory layout of SFML classes

Laurent

Re: Optimize memory layout of SFML classes

eXpl0it3r

Re: Optimize memory layout of SFML classes

Mr. X

Re: Optimize memory layout of SFML classes

DarkRoku12

Re: Optimize memory layout of SFML classes

Kojay

Re: Optimize memory layout of SFML classes

binary1248

Re: Optimize memory layout of SFML classes