SFML community forums
General => Feature requests => Topic started by: Bozemoto on September 18, 2011, 04:21:18 pm
-
Hi, a short while ago I was discussing my own color struct with a friend and my friend is a very meticulus person and thought my four GLUBytes were inefficient. He wanted me to use an int. Instead of doing what he said a did both
union Color{
struct ByteData{
GLUByte r, g, b, a;
}byte_data;
GLUint int_data;
}
this way when assigning one color to another I simply use
color1.int_data = color2.inte_data;
simple and effiecient.
I even overloaded the operators for it to make it more streamlined.
The reason I'm writing it here is because I'm still very much new to OpenGL and for the AI cource Im currently taking I'm using SFML.
I wanted to create an image with random pixels in it, and instead of calling random once for every byte in every color it would be nice if I could write
color = rand();
instead. I guess I could write
(*reinterpret_cast<int*>(&color)) = rand();
But thats so ugly.
Thoughts are welcome.
-
I think that making such ugly modifications to a class, just to allow a simpler syntax for a very specific task, is... ugly as well ;)
Just write a function that creates a random color and use it instead of std::rand().
sf::Color rand_color()
{
return sf::Color(std::rand() % 255, std::rand() % 255, std::rand() % 255);
}
color = rand_color();
By the way, assigning the result of std::rand() to a color would only fill 2 components ouf of 4, RAND_MAX is 65536 and therefore can only generate values for 16 bits.
-
sf::Color rand_color()
{
return sf::Color(std::rand() % 255, std::rand() % 255, std::rand() % 255);
}
Did you mean "% 256"?
By the way, assigning the result of std::rand() to a color would only fill 2 components ouf of 4, RAND_MAX is 65536 and therefore can only generate values for 16 bits.
Assuming it actually is 65536. The value of this constant depends on the implementation, it's usually 2^31-1 (2147483647) or 2^15-1 (32767), I haven't yet seen an implementation where the result of rand() is computed modulo 65537, the number is unwieldy and it would destroy distribution.
-
Well it is faster to.
color1.int_data =color2.int_data;
is faster or atleast as fast as
color1.r = color2.r;
color1.g = color2.g;
color1.b = color2.b;
color1.b = color2.a;
depending if the compiler spots the optimization it can do.
Not a very big optimization, granted, but an optimization non the less.
Or am i mistaken?
Also, the interface for the class would be identical would it not?
Everything is already hidden behind operator overloading and constructors. Only the rare cases when you want to access the data itself would it be "ugly" to use.
Lastly I would like to ask. Isn't making it a class excessive? I mean wouldn't a POD struct/union be enough. And then maybe use a function to create colors. like
CreateColor(GLUbyte r, GLUbyte r, GLUbyte r, GLUbyte a=255);
Making it a class creates a v_table for it doesn't it? Seems a bit unnecessary for something so simple as a color.
I hope I haven't come across as rude, I realy like your work. Just a few questions for my own personal education.
-
Did you mean "% 256"?
Yes, sorry.
Assuming it actually is 65536. The value of this constant depends on the implementation, it's usually 2^31-1 (2147483647) or 2^15-1 (32767), I haven't yet seen an implementation where the result of rand() is computed modulo 65537, the number is unwieldy and it would destroy distribution.
That was just an example. The point is that there's no guarantee that it can fill a 32-bits integer.
Well it is faster to.
Honestly, who cares?
Lastly I would like to ask. Isn't making it a class excessive?
Excessive in what? There's no extra overhead.
Making it a class creates a v_table for it doesn't it?
Nop, only classes with vritual functions have a v-table.
I hope I haven't come across as rude
Absolutely not ;)
-
Well it is faster to.
Honestly, who cares?
Well it is fun to optimize. It makes you ask questions and you learn lots by doing it.
Lastly I would like to ask. Isn't making it a class excessive?
Excessive in what? There's no extra overhead.
Making it a class creates a v_table for it doesn't it?
Nop, only classes with vritual functions have a v-table.
Read up on classes and how they implement member functions more closely. Everything makes more sense now, i just assumed that member functions were always handled with function pointers. Which is kinda stupid if I think about it. All of it should be resolvable at compile time.
Does this mean that a POD type can have member functions as long as it doesn't have any base class, constructor, destructor or virtual functions?
EDIT: It does mean that.
-
Well it is fun to optimize. It makes you ask questions and you learn lots by doing it.
I agree, but learning is not my goal for SFML ;)
Does this mean that a POD type can have member functions as long as it doesn't have any base class, constructor or destructor?
Yes, the definition of POD only forbids virtual functions and user-defined constructors/destructor/assignment operators. Regular member functions are ok.
-
I found a cleaner way of doing it just now.
union Color
{
struct
{ GLubyte r,g,b,a; };
GLuint data32;
};
Isn't this a bit less ugly?
It's alot cleaner to work with then my previous version atleast.
-
union Color
{
struct
{ GLubyte r,g,b,a; };
GLuint data32;
};
Correct me if I am wrong, but alignment could be an problem here:
The CPU can access single members faster, if their address is aligned (for example every address%4=0)
In this case &g - &r would be 4, so changing data32 would have no effect on g, b or a
I don't know whether there are compiler activating alignment without asking or whether the C++standart forbids alignment within unions.
-
By the way, if I remember correctly, the standard states that writing a member of a union and reading another is an undefined behaviour.
But anyway, yes, it's still ugly in my opinion. I was not talking about the code itself, but the whole idea.
-
Correct me if I am wrong, but alignment could be an problem here:
The CPU can access single members faster, if their address is aligned (for example every address%4=0)
In this case &g - &r would be 4, so changing data32 would have no effect on g, b or a
I don't know whether there are compiler activating alignment without asking or whether the C++standart forbids alignment within unions.
why is &g - &r? they are both one byte large. the size of 4 GLubyte is 4 bytes and the size of GLuint is also 4bytes. Why would the program pad each of the four bytes with three empty bytes? I have tried reading up a bit on alignment (good sources are scarce) and couldn't make sense of anything. Some testing shows that it works perfectly fine as it is. Though, the data is sort of scrambled. Dont know how it will fare on other platforms though.
0xAABBGGRR - the data is organized like this when writing to it like an int
Perhaps you know why? I'm a bit tired now and cant really think straight.
Also, Laurent. Perhaps you could share a bit more about why you think it's ugly. It seems like you have some kind of insight I don't and I think it will benefit me if I knew what it is that makes this idea so bad.
-
Why would the program pad each of the four bytes with three empty bytes?
Access to addresses aligned on 32 bit bounds may be faster, thus a compiler can perform this optimization. Only arrays guarantee a subsequent memory layout.
Also, Laurent. Perhaps you could share a bit more about why you think it's ugly. It seems like you have some kind of insight I don't and I think it will benefit me if I knew what it is that makes this idea so bad.
Your approach relies on behavior that is not covered by the C++ standard, thus any compiler is allowed to generate code far from the user's intention.
On the other side, you don't really win anything with the union approach. You can easily cover the same functionality in a less "hacky" way, and avoid potential trouble by doing so.
-
I agree that this should not be added to SFML.
I also remember hearing the same thing about using unions this way as being undefined behavior.
That said... you can easily work around this in your program. Just make your own color class that has union behavior and overload the sf::Color casting operator.
-
On the other side, you don't really win anything with the union approach. You can easily cover the same functionality in a less "hacky" way, and avoid potential trouble by doing so.
Why is there no gain?
Why does
color.r = 255;
color.g = 255;
color.b = 0;
color.a = 255;
have the same operating speed as
color.data32 = 0xFF00FFFF;
It atleast to to me isn't obvious.
Also if having a set of 4 GLubytes instead of an array is undefined bahaviour this is also the case in sfml. Al you do in OpenGL is pass a Glubyte pointer to the beginning of the data and it is read in direct succession. So I can be as sure as Laurent that four GLubyte is contained within a 4byte integer. If it wasn't the case neither of our implementations would work. Infact, I mey even be more sure then Laurent since I use a POD type, which guarantees a magic free implementation.
Lastly, please write out the less hacky way. I have obviously not thought about it and would appreciate if you could write it out instead of just saying it exists.
EDIT:
I guess memcpy would be a less "hacky" solution. and you could use it in an overloaded = operator function.
-
We're not talking about speed. In fact no one cares about the slight speed improvement that it brings, really. All we wee is that the class becomes more complex and incorrect (cf. undefined behaviour) for no good design reason.
Also if having a set of 4 GLubytes instead of an array is undefined bahaviour this is also the case in sfml. Al you do in OpenGL is pass a Glubyte pointer to the beginning of the data and it is read in direct succession. So I can be as sure as Laurent that four GLubyte is contained within a 4byte integer. If it wasn't the case neither of our implementations would work.
SFML doesn't use vertex arrays (yet), the color is used in a call to glColor4ub(r, g, b, a).
Infact, I mey even be more sure then Laurent since I use a POD type, which guarantees a magic free implementation.
POD types have no more guarantee than other types about padding.
Lastly, please write out the less hacky way. I have obviously not thought about it and would appreciate if you could write it out instead of just saying it exists.
I already gave a solution for the rand_color() problem. And if you want to assign any integer to a color you can do this:
sf::Color color(sf::Uint32 value)
{
return sf::Color(...bitwise operations to extract components...);
}
No one cares whether it takes 2 or 20 CPU cycles ;)
-
Also if having a set of 4 GLubytes instead of an array is undefined bahaviour this is also the case in sfml. Al you do in OpenGL is pass a Glubyte pointer to the beginning of the data and it is read in direct succession. So I can be as sure as Laurent that four GLubyte is contained within a 4byte integer. If it wasn't the case neither of our implementations would work.
SFML doesn't use vertex arrays (yet), the color is used in a call to glColor4ub(r, g, b, a).
what about GLCheck(glTexSubImage2D(GL_TEXTURE_2D, 0, 0, 0, myWidth, myHeight, GL_RGBA, GL_UNSIGNED_BYTE, &myPixels[0]));
Doesn't glTexSubImage2D read 32 bits at a time?
This is SFML1.6. Do you do it differently in 2.0?
-
Indeed, the small speed gain is neither relevant in the big picture nor worth all the trouble it brings. Sometimes this is also called premature optimization.
I think myPixels is an std::vector<sf::Uint8>, i.e. an array of bytes, so there is no problem. As already stated, arrays guarantee subsequent element addresses, in contrast to separate members.
-
Indeed, the small speed gain is neither relevant in the big picture nor worth all the trouble it brings. Sometimes this is also called premature optimization.
I think myPixels is an std::vector<sf::Uint8>, i.e. an array of bytes, so there is no problem. As already stated, arrays guarantee subsequent element addresses, in contrast to separate members.
No it's not
mutable std::vector<Color> myPixels;
So it only guarantees no padding between Color elements not in between elements inside Color.
Also, I have found no troubles with this method.
Please list them.
-
In SFML 2 it's an array of UInt8.
Also, I have found no troubles with this method.
Please list them.
If the compiler decides to insert padding between the members you're screwed up.
-
In SFML 2 it's an array of UInt8.
Also, I have found no troubles with this method.
Please list them.
If the compiler decides to insert padding between the members you're screwed up.
But then so is SFML 1.6, and it has worked for countless people. So it doesn't seem to happen that often.
I guess I should look into alignment more, does anyone have any good sources?
-
It's still very dangerous because it can change from compiler to compiler, and making a slight modification to the class may completely change the padding which is applied to it.
A few examples (with my compiler):
struct bop
{
char x1;
char x2;
char x3;
char x4;
};
sizeof(bop) == 4, no padding
struct bop
{
short s;
char x1;
char x2;
char x3;
char x4;
};
sizeof(bop) == 6, no padding
struct bop
{
char x1;
short s;
char x2;
char x3;
char x4;
};
sizeof(bop) == 8, padding!
-
I read about that yesterday. And for this specific case it's not very dangerous since we will only ever use 32bits. No more, no less.
Are there any compilers out there that would make
struct Color{ byte r,g,b,a; };
sizeof(Color) == 16 bytes??? or even 8bytes
It's difficult finding data on this.
In other cases with more data involved it's obviously a bad thing to mess around with.
-
I think we could have sizeof(sf::Color) == 16, yes, if the compiler decides to align each member on the size of int. Int is the most natural (and thus faster) type to manipulate for the processor, and padding often consists of aligning data at offsets multiple of sizeof(int). So the probability to have sizeof(sf::Color) == 16 is not negligible.
-
I concede that it's a possibility. But is it probable? Has it ever happened and if so with which compilers?
I mean even if it's possible doesn't mean it happens. I mean everyone uses unsigned char for allocating bytes of data, and there is no guarantee that it's 1byte large. Still people use it since there isn't a compiler that implements it differently.
Glubyte is a typedef for unsigned char. And so is BYTE under windows and so is alot of other macros by other companies.
-
I concede that it's a possibility. But is it probable? Has it ever happened and if so with which compilers?
I don't know, but anyway I wouldn't rely on it.
I mean even if it's possible doesn't mean it happens. I mean everyone uses unsigned char for allocating bytes of data, and there is no guarantee that it's 1byte large. Still people use it since there isn't a compiler that implements it differently.
Glubyte is a typedef for unsigned char. And so is BYTE under windows and so is alot of other macros by other companies.
Correct me if I'm wrong, but I think that the definition of byte is sizeof(char) (or rather the other way round: a char is defined by having a size of one byte). What can be different is the size of one byte in bits: it may not always be 8.
-
Are there any compilers out there that would make
struct Color{ byte r,g,b,a; };
sizeof(Color) == 16 bytes??? or even 8bytes
No. In fact, no practical compiler would even try do to that, that's a huge waste of memory. Since these are bytes, on all platforms I've ever heard of they have natural alignment of 1. So they're accessed equally fast no matter where they lie.
Nexus and Laurent are basically playing language lawyers here. And I actually agree with them on this one. Seriously, how much you'd gain? A thousand cycles in the whole run of the program? That's nothing on modern hardware. Heck, that's nothing even on very old hardware (and by very old I mean the things from '70s)! You're really wasting your time trying to optimize this. Furthermore, your code depends on undefined behavior. That's quite bad.
That was just an example. The point is that there's no guarantee that it can fill a 32-bits integer.
My point is that there's no guarantee that it couldn't fill a 32-bit integer, either ;)
Moreover, what's more important, someone could see that and begin to think that RAND_MAX is 65536 everywhere. It'd be safer to write something like "let's suppose RAND_MAX is 65536..." ;)
-
Moreover, what's more important, someone could see that and begin to think that RAND_MAX is 65536 everywhere. It'd be safer to write something like "let's suppose RAND_MAX is 65536..."
Of course. Sometimes I use shortcuts that are... too short ;)
-
Seriously, how much you'd gain? A thousand cycles in the whole run of the program? That's nothing on modern hardware.
True, it's not a significant gain. But it is fun to play around with things like this. Previous to this thread I had given little thought to data alignment, though I'm not saying I'll always do it in the future.
I think I'll read up on union behaviour a bit, I wont be returning to the thread anymore as I think there is little more to discuss. I appreciate all the feedback I've gotten and wish you all luck with your respective projekts.
Bye.
-
Correct me if I'm wrong, but I think that the definition of byte is sizeof(char) (or rather the other way round: a char is defined by having a size of one byte).
You are right; char, signed char and unsigned char are among the few types in C++ with a fixed size ;)
True, it's not a significant gain. But it is fun to play around with things like this.
That is true, you learn a lot about low-level mechanisms like memory layout and data alignment/padding. C++ is anyway a language in which you can endlessly experiment around and find unorthodox solutions. Just take a look at Boost ;)
In productive code however, I wouldn't take the risk of undefined behavior for no effective advantage (it is extremely unlikely that this is the performance bottleneck of your application).