No problem, working with mmx/sse stuff is (usually) fun I don't know, if you will find any useful and commented simd examples, so this should be one. Feel free to ask any questions. The code can still be made a bit faster, so there's still some work left on it, too.

Oh, and I believe you have a *lot* of fpc experience Inline asm is cool, you can output the register contents to console very easily, so it's easy to follow the operations on data.