I found a weird problem. I made an application that loads a large amount of vertices, and tested in two machines. Both have NVidia cards: FX 5200 and TI 4200. When I setup the device in DirectX, I use these parameters:

D3DDEVTYPE_HAL and D3DCREATE_HARDWARE_VERTEXPROCESSING

Which are supposed to give the best performance with that hardware. In the TI card, that's the fastest configuration, but in the FX card it results to be the slower one. Then I tried these in the FX:

D3DDEVTYPE_HAL and D3DCREATE_SOFTWARE_VERTEXPROCESSING

And those work as the fastest in that card. DirectX is 9.0c, and both machines have the same version and subversion. Also the NVidia drivers are version 91.31 on both. There is no logic reason for a software vertex processing to work faster than hardware. Any idea what could the problem be? :?