first - there is no such thing as a signed shift left so the function _SHL is completely redundant..._SHLU and _SHRU are just wrappers around the normal operators...._SHR does arithmetic shift by division and that is awfully slow, eliminating any advantage of performing a shift...you can as well as use div in the source code, it will even do a better because the division value can be a constant, no need to do "1 shl bits" to get the divisor...

I just found out that at least Delphi (haven't tested FPC) emits SAR when you use the div operator with a power of 2 divisor...doesn't solve the case when the number of bits to shift should vary....