I have not played with DirectX7 for some time, but FlipToGDISurface should not be slow. And you need to do only two things: attach Clipper and use FlipToGDISurface - I thinks it's not too much for miracle?
At least DirectX9 fullscreen sample I've provided for another thread here do show good performance.