Are you testing this only one one machine or multiple different machines? I learned a long time ago that they can be huge difference between how single code runs on different CPU's especially if the code is relying on some extended CPU features. Why so.
Well most of these features have been developed by one of the CPU maker's companies. So the company that designed such feature usually have advantage over others and thus manages to get better performance out of it. But not always. In some cases the CPU maker can not fully integrate one of such features due to licencing and hence might be forced to enable such feature on their CPU's in what is sometimes called software mode. This would result in much worse performance but at least the code that relies on such feature would not fail to work.

So you should do your testing on as many different devices as you can before coming to any conclusion as of which code is better.