Your results are similar to my own. All calling conventions are equally fast
However most of your bench time is taken by the loop, not the call.