Okay... so... here's my attempt to get it working...

Code:
					DWORD* indices32;
					WORD* indices16;

				
					char* vertices;

					testMesh->LockVertexBuffer(0, (void**)&vertices);
					if (testMesh->GetOptions() & D3DXMESH_32BIT)
					{
						testMesh->LockIndexBuffer(0, (void**)&indices32);
						memcpy(&(*intersectTri).p1, &vertices[indices32[faceIndex * 3 + 0] * testMesh->GetNumBytesPerVertex()], sizeof(D3DXVECTOR3));
						memcpy(&(*intersectTri).p2, &vertices[indices32[faceIndex * 3 + 1] * testMesh->GetNumBytesPerVertex()], sizeof(D3DXVECTOR3));
						memcpy(&(*intersectTri).p3, &vertices[indices32[faceIndex * 3 + 2] * testMesh->GetNumBytesPerVertex()], sizeof(D3DXVECTOR3));
					}
					else
					{
						testMesh->LockIndexBuffer(0, (void**)&indices16);
						memcpy(&(*intersectTri).p1, &vertices[indices16[faceIndex * 3 + 0] * testMesh->GetNumBytesPerVertex()], sizeof(D3DXVECTOR3));
						memcpy(&(*intersectTri).p2, &vertices[indices16[faceIndex * 3 + 1] * testMesh->GetNumBytesPerVertex()], sizeof(D3DXVECTOR3));
						memcpy(&(*intersectTri).p3, &vertices[indices16[faceIndex * 3 + 2] * testMesh->GetNumBytesPerVertex()], sizeof(D3DXVECTOR3));
					}
					
					testMesh->UnlockIndexBuffer();
					testMesh->UnlockVertexBuffer();
I use a char* for the vertex buffer so I can pick the offset with GetNumBytesPerVertex(). Then I rely ont he fact that position comes first to do a binary copy into the D3DXVECTOR3 fields. Models can have either 16 or 32-bit index buffers, hence the WORD and DWORD split. However, the indexes returned are invalid (way outside of the vertex buffers' range).

Edit: So.. it just returned a faceIndex that was GREATER than the number of vertices..... that shouldn't be possible, right?

Edit: Fixed the code to the working form for reference by whoever wants it. LockIndexBuffer and LockVertexBuffer do their own memory management, so malloc causes memory leaks.