Branch Removal

BranchRemoval is an extreme optimization pattern written by JoeSeda that was workshopped at the ChicagoPatternsGroup earlier in 1997. Here is the gist of the pattern...

Branches caused by conditional statements, e.g. if() statements, slow CPU performance on pipelined hardware.


Replace conditional statements with bit masking & shifting from the sign bit. "A < B" becomes "A - B". Replace following assignments with assignment from array values. Remove nested branches leaving outer branches intact.

Huh??? Could you repeat that in English, please? Thanks.

Example: original version

  if (A < B)
    return 42;
    return -66;

"Optimized" version:

  int ar[] = {-66, 42};
  return ar[((A-B)>>BITS_PER_WORD)&1];

Isn't this what was called decision tables (with jump tables as an extension) in the 50s? It reminds me of my father. Nothing is new under the sun. --MarcGirod

A pretty good little mini-pattern which has been around for a while. As I've learned, when someone says, "nothing new", that's a good time to point out that patterns are indeed nothing to be afraid of. The example is given in C-like syntax, so it's reasonable to suppose that the code in question is in C and would be compiled directly to machine instructions. An example of writing assembly in C syntax. Useful if you can't drop to assembly directly, but I'm not sure why that would be the case. Perhaps for platform independence? But this sort of optimization seems highly platform-specific, presuming at least pipeline prediction in the CPU. Can the author clarify the context and forces please? --StevenNewton
Steven, consider the case where your code is running on multiple platforms (so portability counts), but all these platforms have extremely high branch miss costs (common, these days). Further say that you trust you compiler to give good code, but not to do structural magic.

Given a boolean value b,

  int foo = b ? 34 : 0;

Presumably, this will result in a branch being compiled (although you would want to check this, if you're bothering to optimize).

  int foo = !!b * 34;

or, equivalently,

  int foo = -!!b & 34;

Presumably, this will not (although, again, you'd want to check this).

The code is of similar complexity, and is obvious to a reader who has seen the idiom. However, it may be faster on most of the hardware under consideration, because branches have been removed. Make sense?


See also: PulseLogic, FunExerciseAnswer


View edit of October 6, 2005 or FindPage with title or text search