I know it might not work, or yield similar results (I'm not good with C), but this should do what you're after. I tried reducing the legwork that was going on as much as possible to speed things up though. Try this:
Code:
function carmack_func(const x: double): double; inline;
var halfx: double;
    carmack: integer;
begin
  halfx := x * 0.5; // Floating point multiplication is usually faster.
  carmack := $5f3759df - (int(x) shr 1);
  result := carmack * (1.5 - halfx * carmack * carmack);
end;
To speed that up further, try changing it to a procedure and pass the floating point result as an OUT parameter. That could speed things a little.

I don't know about the second part. It's been years since I touched OpenGL by hand, and even then I wasn't particularly good at it. However, you might be faster at it if you pass things as a record ... I just don't know. You could try using a pointer for increased speed, but I don't know if there would be a gain there. Try this to see:
Code:
procedure pointerWalk;
var 
  vBuffer: PVector3f;
  count: integer;
begin
  count := 2500;
  // Assign the memory block to the pointer, I don't remember the Delphi
  // code off-hand.  But the size is 'SizeOf(TVector3f)*2500' if you need
  // to know how.  :P
  repeat
    vBuffer^ := Vector3fMake(0,0,0);
    Inc(vBuffer);
    Dec(Count);
  until count = 0;
end;
I honestly don't know that it'd make much of a difference, if any.

Oh, and all this was written in ConText/Firefox, so no promises that it actually functions as-is. You might need to tweak it a little.

EDIT:

On second thought, if you're only making a blank array of vectors it'd be faster to do:
Code:
FillChar(vBuffer, #0, SizeOf(vBuffer));
That's as utterly fast as you can do it without assembler.