At this moment, there's no publicly available text-to-speech engine that sounds anything like a human voice. I believe what comes closest to the real thing is AT&T's text-to-speech engine:
http://www.research.att.com/projects/tts/demo.html
(this can export WAV files, so it might be what you're looking for)

The engine is available for purchase here:
http://www.naturalvoices.att.com/

In general I would discourage using text-to-speech in an adventure game. It might qualify as a computer voice but definitely not as a human one.

If you can, use real pre-recorded voices. The lip-synching can be achieved through frequency analysis. If you've watched japanese anime before, you will know that you can get away with very basic mouth-shapes. The most important part is to make sure the mouth is shut whenever it's supposed to be shut. This can be achieved by checking if the volume is below a certain level.

Of course you can get really fancy with this. You could search your speech-files for certain patterns and link them to the different mouth shapes so you'd actually get an O-Mouth whenever an "o" is spoken.

Anyway, if you absolutely want to use text-to-speech, the Microsoft text-to-speech engine probably is the easiest solution, but it sounds pretty bad and the download is huge.