Author Topic: Safe Casting of types (Read 6032 times)

Groogy · « **on:** December 31, 2011, 01:45:02 am »

Yeah, I'm wondering if it's safe to to cast like between RenderWindow and Window in the C binding?

Can I more or less do something like:

Code: [Select]

sfRenderWindow *window = sfRenderWindow_Create(/*foobar*/);
sfWindow_Display( window );

You know where we have inheritance in C++. To use it similarly in C.

Nexus · « **Reply #1 on:** December 31, 2011, 01:48:17 am »

No, since there is no inheritance in C, the pointer types are not related. Therefore, access to the converted pointer results in undefined behavior.

Groogy · « **Reply #2 on:** December 31, 2011, 02:30:54 am »

But if I got it right, isn't the sfWindow struct just a place-holder type for the C++ type? So the pointer is still point to sf::Window/sf::RenderWindow memory?

So if I write the code above it should still be defined right? It should be pretty safe to down-cast as long as it's the first parent in the inheritance list. If not I have to offset the pointer and this is much more dangerous so just taking the simple case first.

Not part of the question, only an idea!
Also if the C structures are filled with something like this:

Code: [Select]

struct sfWindow
{
        char [sizeof(sf::Window)];
};

then it's possible to access every part of the objects as you can easily do a offset to reach the render target:

Code: [Select]

sfRenderWindow *window = /* foobar */
sfRenderTarget *target = ((sfWindow *)window) + 1

This would give one the possibility to also respect the inheritance in C if you are just writing a small wrapper like I am and you can use the same function/code for all different variants. This is of course if the structures are just like I said, empty, place holders for the pointer and it points to the same point in memory in reality.

So this would allow C developers to be able to do something like:

Code: [Select]

sfRenderWindow *window = /* Foo bar */
sfRenderTexture *texture = /* Foo bar */
RenderScene( sfRenderWindow_GetRenderTarget( texture ) );
RenderScene( sfRenderTexture_GetRenderTarget( window ) );

This of course mean more work for Laurent but it would give more flexibility IMO.

Nexus · « **Reply #3 on:** December 31, 2011, 11:09:54 am »

Quote from: "Groogy"

But if I got it right, isn't the sfWindow struct just a place-holder type for the C++ type? So the pointer is still point to sf::Window/sf::RenderWindow memory?

Not necessarily. But even if it did, C couldn't correctly cast it, since it has no knowledge about class hierarchies. It would behave like reinterpret_cast in C++, which is here undefined behavior, if the resulting pointer is used.

Quote from: "Groogy"

then it's possible to access every part of the objects as you can easily do a offset to reach the render target:
Code: [Select]
sfRenderWindow *window = /* foobar */ sfRenderTarget *target = ((sfWindow *)window) + 1

Again, the cast to sfWindow* is undefined. Apart from that, no one guarantees that the sf::RenderTarget member begins sizeof(sfWindow) bytes after the beginning. You make assumptions about an implementation detail of both the compiler and the SFML library.

Furthermore, the char array + placement new approach requires proper alignment, which is problematic when allocated on the stack.

Quote from: "Groogy"

This would give one the possibility to also respect the inheritance in C

No, it just opens a door for hard-to-track bugs, since the whole story bases on undefined behavior. Even if it might work for your specfic use case, compiler and architecture, it is extremely unsafe. If you begin to think about dirty hacks in order to emulate features not available in C, you had better directly use C++.

One cannot even perform an explicit upcast since sfWindow and sfRenderWindow are completely unrelated. They both contain a value member (sf::Window and sf::RenderWindow, correspondingly), so one would have to copy the instance, which is not possible.

Groogy · « **Reply #4 on:** December 31, 2011, 04:47:59 pm »

Quote from: "Nexus"

Not necessarily. But even if it did, C couldn't correctly cast it, since it has no knowledge about class hierarchies. It would behave like reinterpret_cast in C++, which is here undefined behaviour, if the resulting pointer is used.

It doesn't have to know about the class hierarchies. I've worked around low-level in the memory before and it has always worked like I described. I don't know if it's decided in the C++ standard but every C++ compiler(VC++, GCC, DM) I've worked with defines the first parent to be at offset ptr + 0 and the next parents at ptr + sizeof( firstParent + ... ) and finally the current class at ptr + sizeof( firstParent + ... + nParent.

I've even created my own OO system in C which worked exactly like C++ with inheritance and I even succeeded with a vtable. I even added pre-parsing to it to create a proper language. And the inheritance was done by placing the parent first in the class. And this worked with up-casting as well. And this code was fully legitimate and C++ works the same way:

Code: [Select]

struct AClass
{
        int foo;
};

struct BClass
{
        struct AClass _parent;
        int bar;
};

struct BClass bObject;
bOBject.bar = 5;
struct AClass *parent = ( struct BClass *)&bObject;
parent->foo = 10;

assert( bObject._parent.foo == 5 ); // Will be true
assert( bObject.bar == 10 ); // will be true

I'm not just talking out of my hat and making this up. I have verified all of this with extensive testing on all the compilers available to me. Both on C and C++. For heck's sake, I have even created a small GC experiment in C/C++, and when debugging that I really had to print out the exact memory layout in several different cases and check how to handle it. The only thing I don't handle in it is to find the end of the stack-frame.

Quote from: "Nexus"

Again, the cast to sfWindow* is undefined. Apart from that, no one guarantees that the sf::RenderTarget member begins sizeof(sfWindow) bytes after the beginning. You make assumptions about an implementation detail of both the compiler and the SFML library.

No it is not really undefined, all it does is look at a piece of memory in a new way. I'm not making that big of assumptions. But you are starting to in the next few rows. But you are right, I have no guarantee that the next class is placed directly after sizeof(sf::Window) but it's the simplest way to do it for it compiler programmers so why wouldn't they? It seems to work for me. And you are focusing too much on this part which was not part of the question. It was just a quick idea that I know works and yes the casting is low and dirty but I don't propose that the C developer does it himself it should of course be hidden from the interface. And I think the possibilities are big enough for it to be implemented since the latest changes to the graphics interface. Because there is so much inheritance there now that you would like to take use of. Hey this would even allow C developers to create their custom Drawables among other things. It would give C developers the full functionality of C++ SFML and also ease of use. Now instead of writing X functions for each kind of Transformable they will just have to write one single one.

Quote from: "Nexus"

Furthermore, the char array + placement new approach requires proper alignment, which is problematic when allocated on the stack.

No it is not? And now you are really starting to make assumptions. C doesn't allow the SFML objects to be allocated on the stack. And you are completely off here. Doesn't matter if it's in the heap or the stack the object is looks the same when just looking at the memory. But it differs in how it is allocated and how it's placed relative to other objects. But that is irrelevant for this question.

Quote from: "Nexus"

One cannot even perform an explicit upcast since sfWindow and sfRenderWindow are completely unrelated. They both contain a value member (sf::Window and sf::RenderWindow, correspondingly), so one would have to copy the instance, which is not possible.

Uh whut? I've done it several times. Either your not making yourself very clear or you are living on another planet.

The definition is:

Code: [Select]

struct sfWindow
{
    sf::Window This;
};

Which means by the definitions of how C-style casting works and what I described before that this works just perfectly:

Code: [Select]


sfWindow *windowPtr = sfWindow_Create( /* foobar */ );
sf::Window *windowObj = (sf::Window *)windowPtr;

The only thing C has to do is to respect C++ memory layout which I've already defined. As long as it does that C++ will work as intended.

Groogy · « **Reply #5 on:** December 31, 2011, 05:00:15 pm »

Here's a simple test-case with an vtable as well that works perfectly.

Code: [Select]

#include <iostream>

class AClass
{
public:
	virtual ~AClass() { std::cout << "~AClass" << std::endl; };
	int foo;
};

class BClass
{
public:
	virtual ~BClass() { std::cout << "~BClass" << std::endl; };

	int bar;
};

class ChildClass : public AClass, public BClass
{
public:
	virtual ~ChildClass() { std::cout << "~ChildClass" << std::endl; };
};

int main()
{
	{
		ChildClass stackObject;
		stackObject.foo = 10;
		stackObject.bar = 15;

		AClass *aParent = (AClass *)&stackObject;
		BClass *bParent = (BClass *)(aParent + 1);
		std::cout << "stackObject.foo = " << aParent->foo << std::endl;
		std::cout << "stackObject.bar = " << bParent->bar << std::endl;
	}

	{
		ChildClass *heapObject = new ChildClass;
		heapObject->foo = 10;
		heapObject->bar = 15;

		AClass *aParent = (AClass *)heapObject;
		BClass *bParent = (BClass *)(aParent + 1);
		std::cout << "heapObject.foo = " << aParent->foo << std::endl;
		std::cout << "heapObject.bar = " << bParent->bar << std::endl;
		delete bParent; /* Changed to bParent from heapObject */
	}
}

If you add a function will change nothing unless it's virtual, that will change the vtable, but as I showed, that works as well.

Do I also have to show you that it works with wrapper structures?

I've even tried this with extreme cases as where we are specifying explicitly alignment in order to use intrinsic functions and stuff like that. Still works like a charm. Why? Because as long as the memory layout is respected. It works.

Ow yeah are you reading this Laurent? Or you don't like it at all? Also think it's too low and dirty? Just think that it would really help out with the new graphics interface and also increase the flexibility of the C API and future bindings based on C will be able to support the inheritance better. Heck sake, do you know how much I struggles in C++ just to get inheritance to work properly in Ruby? And there I even have the entire hierarchy exposed to me!

Because actually you can implement dynamic_cast behind the C interface to take care of the casting if you are too afraid of what I am writing. What the C developer don't see won't hurt him anyway.

AND HAPPY NEW YEAR

EDIT: Changed the delete of the heap object to really show that using the vtable is intact. Just to be extra clear.

Silvah · « **Reply #6 on:** January 01, 2012, 02:46:04 pm »

Quote from: "Groogy"

Quote from: "Nexus"
Again, the cast to sfWindow* is undefined. Apart from that, no one guarantees that the sf::RenderTarget member begins sizeof(sfWindow) bytes after the beginning. You make assumptions about an implementation detail of both the compiler and the SFML library.
No it is not really undefined, all it does is look at a piece of memory in a new way.

It is, because the standard says so. It may seem to work, but it's undefined nonetheless.

Nexus · « **Reply #7 on:** January 01, 2012, 03:56:54 pm »

We say it is undefined behavior, and your only argument is "yes, but it works". The standard doesn't guarantee it, there's nothing to discuss. Even if it currently seems to work, maybe even on different compilers, it may always happen that a new compiler version comes up and breaks everything. Then you have a giant mess because you don't know where to start searching bugs.

And this is not an utopic scenario, it has happened with iterators from VS 2008 to VS 2010. Every operation on singular iterators except the assignment of a valid iterator is undefined. In VS 2008, one could also assign singular iterators to each other, in VS 2010 this suddenly didn't work anymore. Because the Dinkumware developers decided to make use of the undefined behavior, which is perfectly legitimate. Compiler and standard library writers are free to exploit the borders of the C++ standard. This is why so many operations are undefined: To leave freedom for specific implementations.

You however recommend Laurent to ignore the C++ standard and to introduce undefined behavior in CSFML only for the sake of using it a little more like C++ SFML? It is a completely different thing whether you do such things in your own project or in a library which is more and more widespread and which may be used by people on all kinds of compilers, architectures and operating systems. Laurent can't test every single combination -- especially not future ones -- and that's why I consider it careless to write clearly ill-formed code in the hope that "it works".

Don't forget that there is the possibility to follow some of your requests without undefined behavior, for example by explicit upcast functions in the official API. This has the additional advantage that Laurent isn't forced to expose and leave implementation details forever in order to keep your code operational.

Happy new year

Groogy · « **Reply #8 on:** January 02, 2012, 12:32:48 am »

Are you 100% sure that it isn't defined for Pointers? Because Pointers, no matter how you cast them or what you do(except for arithmetic) in C will always point at the same memory. I feel that this should at least be defined for pointers since there isn't much to do there. There are a lot of low-level applications that depend on this behaviour. Let's take the Boehm-Demers-Weiser garbage collector for example, if this behaviour would change, to something else then this GC won't work, and not the GNU Java compiler either as it depends on this implementation. And Ruby also depends on this behaviour to the EXTREME. There are bigger projects than SFML that depends on this. So either it's defined, or you are saying that the rest of the world has gone insane except for you guys

Quote from: "Nexus"

You however recommend Laurent to ignore the C++ standard and to introduce undefined behavior in CSFML only for the sake of using it a little more like C++ SFML?

I did say you are focusing on this too much which was just a quick idea that came out of my head.

Quote from: "Nexus"

Don't forget that there is the possibility to follow some of your requests without undefined behavior, for example by explicit upcast functions in the official API. This has the additional advantage that Laurent isn't forced to expose and leave implementation details forever in order to keep your code operational.

Wasn't, it that which I suggested at my end of my post? Though it was down-cast instead.

Nexus · « **Reply #9 on:** January 02, 2012, 01:35:25 pm »

Quote from: "Groogy"

Because Pointers, no matter how you cast them or what you do(except for arithmetic) in C will always point at the same memory. I feel that this should at least be defined for pointers since there isn't much to do there.

Behavior of C casts with the syntax (type) expr:

Quote from: "C++ standard 2003, §5.4 Explicit type conversion (cast notation)"

5 The conversions performed by
— a const_cast (5.2.11),
— a static_cast (5.2.9),
— a static_cast followed by a const_cast,
— a reinterpret_cast (5.2.10), or
— a reinterpret_cast followed by a const_cast,
can be performed using the cast notation of explicit type conversion. The same semantic restrictions and behaviors apply.

6 The operand of a cast using the cast notation can be an rvalue of type “pointer to incomplete class type”. The destination type of a cast using the cast notation can be “pointer to incomplete class type”. In such cases, even if there is a inheritance relationship between the source and destination classes, whether the static_cast or reinterpret_cast interpretation is used is unspecified.

That is, C-casts are equivalent to one of the mentioned cast operators. For unrelated types, they are equivalent to reinterpret_cast, since neither static_cast nor const_cast can be applied. For related class types, we also have to assume reinterpret_cast is used. Then we can lookup the semantics of reinterpret_cast:

Quote from: "§5.2.10 Reinterpret cast"

3 The mapping performed by reinterpret_cast is implementation-defined. [Note: it might, or might not, produce a representation different from the original value.]

7 A pointer to an object can be explicitly converted to a pointer to an object of different type. Except that converting an rvalue of type “pointer to T1” to the type “pointer to T2” (where T1 and T2 are object types and where the alignment requirements of T2 are no stricter than those of T1) and back to its original type yields the original pointer value, the result of such a pointer conversion is unspecified.

So yes, it's us against the world

Silvah · « **Reply #10 on:** January 02, 2012, 07:48:30 pm »

Also, since it's about casting in the C code:

After a cast like

Code: [Select]

sfRenderWindow *window = /* foobar */ 
sfRenderTarget *target = ((sfWindow *)window) + 1

you end up with a pointer that perhaps stores what you want, as the wording in the C standard is more vague: it does not explicitly say the result is implementation-defined or unspecified or undefined or anything:

Quote from: "ISO/IEC 9899:1999, §6.3.2.3/7"

A pointer to an object or incomplete type may be converted to a pointer to a different object or incomplete type. If the resulting pointer is not correctly aligned for the pointed-to type, the behavior is undefined. Otherwise, when converted back again, the result shall compare equal to the original pointer. When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

So far so good. However, you actually can't use the result for accessing the object:

Quote from: "ISO/IEC 9899:1999, §6.5/7"

An object shall have its stored value accessed only by an lvalue expression that has one of the following types:
a type compatible with the effective type of the object,
a qualified version of a type compatible with the effective type of the object,
a type that is the signed or unsigned type corresponding to the effective type of the object,
a type that is the signed or unsigned type corresponding to a qualified version of the effective type of the object,
an aggregate or union type that includes one of the aforementioned types among its members (including, recursively, a member of a subaggregate or contained union), or
a character type.

You're casting to a type which is not a compatible type, qualified or not, nor is it (obviously) the same type except signedness, nor is it a character type. And, since you aren't accessing it as a field (second-to-last bullet) either, if you're trying to access the object in any way - passing it to a third-party library that does the access still counts as an access - the behavior is undefined.

Author Topic: Safe Casting of types (Read 6032 times)

Silvah

Silvah