
The Art of Writing Efficient Programs
By :

Our analysis of the CPU capabilities so far has shown that the processor can execute multiple operations at once as long as the operands are already in the registers: we can evaluate a fairly complex expression that depends on just two values in exactly as much time as it takes to add these values. The depends on just two values qualifier is, unfortunately, a very serious restriction. We now consider a more realistic code example, and we don't have to make many changes to our code:
for (size_t i = 0; i < N; ++i) { a1 += (p1[i] + p2[i])*(p1[i] - p2[i]); }
Recall that the old code had the same loop with a simpler body: a1 += (p1[i] + p2[i]);
. Also, p1[i]
is just an alias for the vector element v1[i]
, same for p2
and v2
. Why is this code more complex? We have already seen that the processor can do addition, subtraction, and multiplication in a single cycle, and the expression still depends on just two values...