That's a too big hassle for me then...

I also wouldn't trust a compiler to optimize a loop with simd tricks. I could however trust fpc if the rtl had simd optimized rtl functions which would be easy to maintain while the interface could be completely transparent