jack: (Default)
[personal profile] jack
A little while ago, someone told me about a really simple algorithm brainteaser. Suppose you want to find both the minimum and maximum of an array. Instead of writing something like:
   for (int i=0;i<size;i+=2)
   {
      if (arr[i]<min) min = arr[i];
      if (arr[i+1]<min) min = arr[i+1];
      if (arr[i+1]>max) max = arr[i+1];
      if (arr[i]>max) max = arr[i];
   }

You can reduce the number of comparisons per two elements from 4 to 3 by doing something like:
      if (arr[i]<arr[i+1])
      {
         if (arr[i]<min) min = arr[i];
         if (arr[i+1]>max) max = arr[i+1];
      }
      else
      {
         if (arr[i+1]<min) min = arr[i+1];
         if (arr[i]>max) max = arr[i];
      }

I asked, does it make a difference if that pipelines less efficiently, and I didn't really get an answer, but I got the impression that wasn't a sensible question to ask.

But when I actually tried it, with some simple instrumentation code (using "clock()" from "time.h"), the second took about twice as long. On a windows PC, compiled with cl, using O2.

When I looked at the disassembly, each comparison looked to be something like:
   if (arr[i]<min) min = arr[i];
0040118B  mov         ecx,dword ptr [i] 
0040118E  mov         edx,dword ptr [arr] 
00401191  mov         eax,dword ptr [min] 
00401194  mov         ecx,dword ptr [edx+ecx*4] 
00401197  cmp         ecx,dword ptr [eax] 
00401199  jge         min_max_2+59h (4011A9h) 
0040119B  mov         edx,dword ptr [min] 
0040119E  mov         eax,dword ptr [i] 
004011A1  mov         ecx,dword ptr [arr] 
004011A4  mov         eax,dword ptr [ecx+eax*4] 
004011A7  mov         dword ptr [edx],eax

Which didn't seem great, but did seem like the number of instructions was proportional to the number of lines expected to be executed.

What have I missed?