Welcome, Guest. Please login or register. Did you miss your activation email?

Author Topic: performance modulo versus nested for loop  (Read 3264 times)

0 Members and 1 Guest are viewing this topic.

ka0s420

  • Jr. Member
  • **
  • Posts: 56
    • View Profile
    • Email
performance modulo versus nested for loop
« on: April 12, 2016, 12:19:47 am »
Hi there, this is more of a general c++ question, but figured someone on here might have the experience to advise me.

Right, so i have a 2d vector of int tile ids. The vector contains 4,225 ints. In my map editor, I originally used a nested for loop to check through them and change them if a mouse click collided with the space they represent. Here's an example:


bool foundTarg = false;
for (int i = 0; i < tileset.size(); i++)
{
        for (int j = 0; j < tileset[i].size(); j++)
        {
                if (sf::FloatRect(i * 32, j * 32, 32, 32).contains(mouse))
                {
                        tileset[i][j] = tileId;
                        foundTarg = true;
                        break;
                }

        }
        if (foundTarg)
                break;
}

However, I had the following idea, and that was that, at least when the click was deep into the array, that using modulo would save performance by cutting out the nested for loop. So here is an example of the function i used instead:

int x = static_cast<int>(pos.x);
int y = static_cast<int>(pos.y);
int remainderX = x % 32;
int remainderY = y % 32;
sf::Vector2f newPos((x - remainderX) / 32, (y - remainderY) / 32);
return newPos;
 

any thoughts on which would be more performant? Like I said, it seems obvious to me that the modulo method would be faster if the vector element was further in. But I'm sure its making my fps drop more than iterating through the vectors, which doesn't make sense to me.

Thanks for your time and patience  ;)

eXpl0it3r

  • SFML Team
  • Hero Member
  • *****
  • Posts: 11034
    • View Profile
    • development blog
    • Email
Re: performance modulo versus nested for loop
« Reply #1 on: April 12, 2016, 12:32:30 am »
It all comes down to how smart your compiler is, so all you can do is write a benchmark.

Modulo can be implemented with a division and maybe a subtraction if the division doesn't already provide the remainder. So unless the compiler can optimize the code equally well, I'd say the modulo calls will be faster.

Also there are really better places like StackOverflow to ask general C++ questions. We try to stay focused on things evolving around SFML. ;)
Official FAQ: https://www.sfml-dev.org/faq.php
Official Discord Server: https://discord.gg/nr4X7Fh
——————————————————————
Dev Blog: https://duerrenberger.dev/blog/

ka0s420

  • Jr. Member
  • **
  • Posts: 56
    • View Profile
    • Email
Re: performance modulo versus nested for loop
« Reply #2 on: April 12, 2016, 01:09:43 am »
okay thanks for the reply. Sure i'll use SO next time, I just don't enjoy being condescended to that extreme. The condescension on this site is much more bearable and constructive. Im using visual c++ visual studio 2015. I guess ill try and make a benchmark then, something i've yet to do, but should be a worth while learning experience. Thanks again, Exploiter!

DarkRoku12

  • Full Member
  • ***
  • Posts: 203
  • Lua coder.
    • View Profile
    • Email
Re: performance modulo versus nested for loop
« Reply #3 on: April 13, 2016, 01:06:25 am »
Modulo would be faster because:

1) The nested loop can't be unrolled because the size of vectors are not static. So, there will be a jmp (jump) instruction for the loop.
2) 32 is power of 2 . considering that we have 2 optimizations:

For division:
   x = y / 32 ;
 

is the same that:
 
  x = y >> 5;
 

Bit shifting if faster than division, any good compiler will use bit shifting when possible.

And for the module ( % ) operand:

   x = y % 32 ;
 

Could be optimized to:

   x = ( y & ( 32-1) ) ;
 

Look at this snippet:

unsigned int getModulo(unsigned int n, unsigned int d)
{
  return ( n & ( d-1 ) ) ;
}        
// Where "n" is any unsigned int and "d" is a power of two number. ( In this case 32 )
 

Take into account that bit shifting only admit integers so, you'll only get integers.
« Last Edit: April 13, 2016, 01:09:49 am by DarkRoku »
I would like a spanish/latin community...
Problems building for Android? Look here

Nexus

  • SFML Team
  • Hero Member
  • *****
  • Posts: 6287
  • Thor Developer
    • View Profile
    • Bromeon
Re: performance modulo versus nested for loop
« Reply #4 on: April 13, 2016, 10:10:44 pm »
Like I said, it seems obvious to me that the modulo method would be faster if the vector element was further in. But I'm sure its making my fps drop more than iterating through the vectors, which doesn't make sense to me.
There's probably something else in your code. If you want to compare, extract and isolate the two code parts and as eXpl0it3r mentioned, write a meaningful benchmark (based on time, not FPS -- FPS are a nice for gamers, but not for serious measurement, as they're not linear). To analyze a specific performance bottleneck, profilers are also an option.

The points mentioned by DarkRoku are also really good ones. There's no need to obfuscate code with bitwise operations, because as he says, compilers already do that optimization. But even if an actual division/modulo would be required, the for loop is easily slower as soon as a few iterations are needed, as there's some work to do (sf::FloatRect::contains() is several additions and comparisons). Even more important here is the fact that your vectors are fragmented in memory, leading to cache misses.

In general, if you can do something in O(1) instead of O(x*y), do it. You need to forcefully construct cases where it's not faster (and simpler).
Zloxx II: action platformer
Thor Library: particle systems, animations, dot products, ...
SFML Game Development:

ka0s420

  • Jr. Member
  • **
  • Posts: 56
    • View Profile
    • Email
Re: performance modulo versus nested for loop
« Reply #5 on: April 14, 2016, 02:34:27 am »
okay thanks guys, much appreciated. Great info!  :)