Author Topic: performance modulo versus nested for loop (Read 3247 times)

ka0s420 · « **on:** April 12, 2016, 12:19:47 am »

Hi there, this is more of a general c++ question, but figured someone on here might have the experience to advise me.

Right, so i have a 2d vector of int tile ids. The vector contains 4,225 ints. In my map editor, I originally used a nested for loop to check through them and change them if a mouse click collided with the space they represent. Here's an example:

bool foundTarg = false;
for (int i = 0; i < tileset.size(); i++)
{
        for (int j = 0; j < tileset[i].size(); j++)
        {
                if (sf::FloatRect(i * 32, j * 32, 32, 32).contains(mouse))
                {
                        tileset[i][j] = tileId;
                        foundTarg = true;
                        break;
                }

        }
        if (foundTarg)
                break;
}

However, I had the following idea, and that was that, at least when the click was deep into the array, that using modulo would save performance by cutting out the nested for loop. So here is an example of the function i used instead:

int x = static_cast<int>(pos.x);
int y = static_cast<int>(pos.y);
int remainderX = x % 32;
int remainderY = y % 32;
sf::Vector2f newPos((x - remainderX) / 32, (y - remainderY) / 32);
return newPos;
 

any thoughts on which would be more performant? Like I said, it seems obvious to me that the modulo method would be faster if the vector element was further in. But I'm sure its making my fps drop more than iterating through the vectors, which doesn't make sense to me.

Thanks for your time and patience

eXpl0it3r · « **Reply #1 on:** April 12, 2016, 12:32:30 am »

It all comes down to how smart your compiler is, so all you can do is write a benchmark.

Modulo can be implemented with a division and maybe a subtraction if the division doesn't already provide the remainder. So unless the compiler can optimize the code equally well, I'd say the modulo calls will be faster.

Also there are really better places like StackOverflow to ask general C++ questions. We try to stay focused on things evolving around SFML.

ka0s420 · « **Reply #2 on:** April 12, 2016, 01:09:43 am »

okay thanks for the reply. Sure i'll use SO next time, I just don't enjoy being condescended to that extreme. The condescension on this site is much more bearable and constructive. Im using visual c++ visual studio 2015. I guess ill try and make a benchmark then, something i've yet to do, but should be a worth while learning experience. Thanks again, Exploiter!

DarkRoku12 · « **Reply #3 on:** April 13, 2016, 01:06:25 am »

Modulo would be faster because:

1) The nested loop can't be unrolled because the size of vectors are not static. So, there will be a jmp (jump) instruction for the loop.
2) 32 is power of 2 . considering that we have 2 optimizations:

For division:

   x = y / 32 ;
 

is the same that:

  x = y >> 5;
 

Bit shifting if faster than division, any good compiler will use bit shifting when possible.

And for the module ( % ) operand:

   x = y % 32 ;
 

Could be optimized to:

   x = ( y & ( 32-1) ) ;
 

Look at this snippet:

unsigned int getModulo(unsigned int n, unsigned int d)
{
  return ( n & ( d-1 ) ) ;
}         
// Where "n" is any unsigned int and "d" is a power of two number. ( In this case 32 )
 

Take into account that bit shifting only admit integers so, you'll only get integers.

Nexus · « **Reply #4 on:** April 13, 2016, 10:10:44 pm »

Quote from: ka0s420 on April 12, 2016, 12:19:47 am

Like I said, it seems obvious to me that the modulo method would be faster if the vector element was further in. But I'm sure its making my fps drop more than iterating through the vectors, which doesn't make sense to me.

There's probably something else in your code. If you want to compare, extract and isolate the two code parts and as eXpl0it3r mentioned, write a meaningful benchmark (based on time, not FPS -- FPS are a nice for gamers, but not for serious measurement, as they're not linear). To analyze a specific performance bottleneck, profilers are also an option.

The points mentioned by DarkRoku are also really good ones. There's no need to obfuscate code with bitwise operations, because as he says, compilers already do that optimization. But even if an actual division/modulo would be required, the for loop is easily slower as soon as a few iterations are needed, as there's some work to do (sf::FloatRect::contains() is several additions and comparisons). Even more important here is the fact that your vectors are fragmented in memory, leading to cache misses.

In general, if you can do something in O(1) instead of O(x*y), do it. You need to forcefully construct cases where it's not faster (and simpler).

ka0s420 · « **Reply #5 on:** April 14, 2016, 02:34:27 am »

okay thanks guys, much appreciated. Great info!

Author Topic: performance modulo versus nested for loop (Read 3247 times)

ka0s420

performance modulo versus nested for loop

eXpl0it3r

Re: performance modulo versus nested for loop

ka0s420

Re: performance modulo versus nested for loop

DarkRoku12

Re: performance modulo versus nested for loop

Nexus

Re: performance modulo versus nested for loop

ka0s420

Re: performance modulo versus nested for loop