The main reason why method which uses WORLD transformation is faster is that more job is done on GPU, not on CPU. That is one thing you should always keep in mind when making rendering engine - do as much of work as you can on GPU!

Also, you should keep CPU/GPU traffic as low as possible (that is one of reasons why indexed primitives are faster), so it's faster to send to GPU "render this vertex buffer" (assuming vertex buffer is created in DEFAULT or MANAGED pool) instead of "render this vertex buffer UP" (when you have to send all the data to GPU).


hey also use D3DTS_WORLDMATRIX(index), but I still don't understand what is it for...
It is used for skeletal animation. You first need to setup bones in your mesh and define WeightMap for each bone. After that, you set WorldMatrix for each bone (up to 256), and you let GPU to do the job


And, don't forget - test, test, test!

Best regards!