In a quite surprising move, the shl or shl or shl or shl or ... etc. code is actually faster than the assembler! That's one clever function! Unfortunately it just doesnt beat a brute force lookup!

Also interestingly, I tried cairnswm's algorithm as a function returning one integer and also straight as an inline lookup and according to the graph below, the overhead incurred by the function is around 640 millionths of a second (0.00064) per execution.