I have used ID3DXMesh and it works like you described.

Lock the vertexbuffer to retrieve a pointer to video-memory. This pointer points to an array of vertex structs/records. Lock the indexbuffer and read the indices in groups of three (each group is a triangle/face).. each of the indices refers to a vertex in the vertexbuffer. That's how it's done.