Also the reason why I believed passing by reference was faster was due to my assumption that a reference would be passed as a pointer is ( 4 byte object ). I'm apparently wrong and I appreciate it being pointed out.
Your assumption is correct (in most usual cases).
In spite of that, you have to consider other operations, too. I will try to explain them (not taking function inlining into account).
sf::Vector3f Left(3, 4, -7);
sf::Vector3f Right(-2, 1, 5);
sf::Vector3f Result; // The standard constructor initializes 3 floats with 0
Dot(Left, Right, Result);
The function call may expand to that:
void Dot(const sf::Vector3f& v1, const sf::Vector3f& v2, sf::Vector3f& vResult)
// note: 3 references copied, probably 12 bytes.
{
vResult.x = v1.y * v2.z - v1.z * v2.y;
vResult.y = v1.z * v2.x - v1.x * v2.z;
vResult.z = v1.x * v2.y - v1.y * v2.x;
// Without optimizations, there are 15 dereferencings! (Every time you
// access a vector through the point operator.) Perhaps, they're
// optimized away (for example by inlining), but keep this in mind
// nonetheless. Besides, three assignments to x, y and z are executed.
}
Now let's take a look at the other version.
sf::Vector3f Dot(const sf::Vector3f& v1, const sf::Vector3f& v2)
// 2 reference parameters copied, but we need the address of the target
// (explained later)
{
// 12 dereferencings remain. By passing Left and Right by value, we
// could trade them off against 6 float copies. Not very wise, if bigger
// types than float are treated.
// But regarding the essential stuff:
return sf::Vector3f(
v1.y * v2.z - v1.z * v2.y,
v1.z * v2.x - v1.x * v2.z,
v1.x * v2.y - v1.y * v2.x); // creates a temporary object
} // copies the object for the return value
sf::Vector3f Destination = Dot(Left, Right); // = copies it again.
Looks like many copies. But we've forgotten one thing - RVO. Return value optimization is - at modern compilers - often applied in order to elide unnecessary copies. But how does the compiler know that the copy is unnecessary? It may not elide any copies because sometimes, the original object shall not be changed.
In this case, the first copy (return value) refers to a temporary object, so nobody cares if we access it directly. At initialization time (copy-constructor called by = in declaration outside the function) the same thing applies, no one requires the origin any more. The 2 copies can be omitted. Then there is the actual construction, which finally initializes our vector called Result. The construction contains 3 float initializations, the object can be constructed in-place outside the function (for this, the function requires the object's address). Like this, there isn't even a temporary.
So the first version performs the following operations which are important for the initialization of Result:
- 3 zero-initialized floats (default constructor outside function)
- 3 additional dereferencings inside the function (write access to vResult)
- 3 assignments to x, y, z inside the function
Given the compiler knows RVO, which is likely, the second version is able to initialize the object directly:
- 3 value-initialized floats (parametrized constructor)
Ok, this is float, how about bigger types? The operations remain the same.
However, the second version may pay even more out. The reason is, default construction plus assignment is almost always less efficient than direct value-construction.
By the way: C++0x, the next C++ standard, expands the explained concept by introducing rvalue references which support actual move semantics. Then, temporaries aren't only a hint to apply RVO anymore. Working with current C++, we can partially achieve move semantics by std::auto_ptr. Unfortunately, this approach is not always optimal as a result of the required heap allocation.
Yet, one shouln't rely on RVO in every case. If huge objects (e.g. a std::vector<std::string> with many elements) are returned by value, it's not bad to be careful. By contrast, a sf::Vector3<T> instance is small, and even if RVO can't be applied, the consequences are not too tragic.
At least, the possibility to write the following is worth risking it:
AnyFunction(CrossProduct(Left, Right));
Because that is not very pleasant (not least because the pass of the named object Result to AnyFunction might require a further copy, depending on the parameter type).
sf::Vector3f Result;
CrossProduct(Left, Right, Result);
AnyFunction(Result);