Quote Originally Posted by imcold
No problem, working with mmx/sse stuff is (usually) fun I don't know, if you will find any useful and commented simd examples, so this should be one. Feel free to ask any questions. The code can still be made a bit faster, so there's still some work left on it, too.

Oh, and I believe you have a *lot* of fpc experience Inline asm is cool, you can output the register contents to console very easily, so it's easy to follow the operations on data.
A follow up:

Due to busy work (and the relevant projects being postponed a few months by the clients), I only got to real testing today.

The code crashed at first, but that was because the pascal code uses register EBX for the loop counter, and this is not saved. For now I quickly pushed pop, but will do the outer loop in asm in the near future too.

I haven't really validated the data (if the image is processed correctly, since i don't have images to test with yet, but the speed is very promising, exactly 10 times faster!

So thanks again