I would like to have some advice on this.

I was transforming each vertex of my meshes in DirectX 9 before using DrawPrimitive. I thought that was the common method to rotate, translate and scale meshes. Then I found that instead, I could use SetTransform(D3DTS_WORLD... saving the processing to transform each vertex.

Now I'm thinking on an optimization method, in which I could sort the meshes by texture, shaders, etc, and then make a large vertex buffer to store several transformed meshes that share those elements. That way I could draw a batch of meshes in one go with DrawPrimitive. The problem of that is that I have to return to the old method of transforming each vertex, instead of using the World transform.

Then what's optimal? Drawing several buffers transformed fastly with the World matrix, or drawing a large buffer containing vertices transformed one by one?