Hey, I also made a test program and at least on my comp, plain Scanline is faster than anything else...

Download it at http://jaeger.xenoware.de/blttest.zip

And yes, XCESS' transparent blit routine is super-slow. It's actually a translation of a code snippet I found @ MSDN a long time ago.
The transparent-blt routine used in my blt-test program is the same that is used in XCESS.