Well, from experience I would start by playing around with the raw Direct Sound interface.

Essentially, you need to init the interface first, then create some cyclic buffers, write the buffers in the shape of the waveform you want, then play!

The hardest bit is actually working out what to write to the buffer, and managing buffer changes. You need to pick a frequency first, then think about the shape of the wave eg. saw tooth, then write the correct bytes (or words if 16 bit) for that shape. If you imagine that a simple square wave could alternate between $FFFF and 7FFF, simply write $FFFF for x number of bytes, then $7FFF for x number of bytes, and you should have a simple square wave. (full height to half height square wave)


I did exactly this for my Nintendo NES emulator, but just after I implemented some crude pAPU (audio) channels I shelved it.

EDIT:

Once you get that far you can start playing around merging wav files etc. It's simply a case of manipulating the buffers.