A little while ago, someone told me about a really simple algorithm brainteaser. Suppose you want to find both the minimum and maximum of an array. Instead of writing something like:
You can reduce the number of comparisons per two elements from 4 to 3 by doing something like:
I asked, does it make a difference if that pipelines less efficiently, and I didn't really get an answer, but I got the impression that wasn't a sensible question to ask.
But when I actually tried it, with some simple instrumentation code (using "clock()" from "time.h"), the second took about twice as long. On a windows PC, compiled with cl, using O2.
When I looked at the disassembly, each comparison looked to be something like:
Which didn't seem great, but did seem like the number of instructions was proportional to the number of lines expected to be executed.
What have I missed?
for (int i=0;i<size;i+=2)
{
if (arr[i]<min) min = arr[i];
if (arr[i+1]<min) min = arr[i+1];
if (arr[i+1]>max) max = arr[i+1];
if (arr[i]>max) max = arr[i];
}
You can reduce the number of comparisons per two elements from 4 to 3 by doing something like:
if (arr[i]<arr[i+1])
{
if (arr[i]<min) min = arr[i];
if (arr[i+1]>max) max = arr[i+1];
}
else
{
if (arr[i+1]<min) min = arr[i+1];
if (arr[i]>max) max = arr[i];
}
I asked, does it make a difference if that pipelines less efficiently, and I didn't really get an answer, but I got the impression that wasn't a sensible question to ask.
But when I actually tried it, with some simple instrumentation code (using "clock()" from "time.h"), the second took about twice as long. On a windows PC, compiled with cl, using O2.
When I looked at the disassembly, each comparison looked to be something like:
if (arr[i]<min) min = arr[i]; 0040118B mov ecx,dword ptr [i] 0040118E mov edx,dword ptr [arr] 00401191 mov eax,dword ptr [min] 00401194 mov ecx,dword ptr [edx+ecx*4] 00401197 cmp ecx,dword ptr [eax] 00401199 jge min_max_2+59h (4011A9h) 0040119B mov edx,dword ptr [min] 0040119E mov eax,dword ptr [i] 004011A1 mov ecx,dword ptr [arr] 004011A4 mov eax,dword ptr [ecx+eax*4] 004011A7 mov dword ptr [edx],eax
Which didn't seem great, but did seem like the number of instructions was proportional to the number of lines expected to be executed.
What have I missed?