Quote Originally Posted by SilverWarior View Post
Inf you go for Integer based algorithmic I recommend you stick to 32 bit Integers since most modern CPU's work either with 32 bit or 64 bit registers when doing math. This way you avoid converting from 16 to 32 bit and back all the time[...]
As far as my research shows, modern CPUs are well equipped to operate on 16 and 8 bit integers natively, no need to touch FPU. Expanding and shrinking is also very well supported in hardware. (Clear 32-bit registers, load 16-bit values into their lower parts, multiply as 32-bit, shift right by 16 bits, store result from the 16-bit lower part).
x86-64, for example, allows addressing registers like r8w..r15w , meaning 16-bit parts of its extra 64-bit registers r8..r15 - not to mention the old trusty ax, bx, cx, dx, si and di of x86 aren't going anywhere.

Serialization is not a problem: I use a library I made back in 2006..2008 that serializes into a binary format (achieved 1 million instances per second back then, on a Dual Core slowed to 1 GHz via lowering multiplier in BIOS, with DDR2-400 RAM)
Normal calculations aren't a problem either: Free Pascal is all about reproducibility, as my tests proved. It's vector normalization and trigonometric things like sincos that slow everything horribly.
Not even physics, but animation - which I want to be part of physics. Which operates on lots and lots of bones.
Also the fact my skinning is going to be calculated on CPU only, for ease of coding and better compatibility. 16-bit numbers are twice as fast in memory bound bottlenecks (too easy to hit)

Quote Originally Posted by Jonax View Post
Integer physics, such an interesting idea . Haven't tried that but I hear it was common for early programmers. I on the other hand rely heavily on the square root for moving things and calculating positions. Haven't considered using a table. It's so easy to just use the square root command. For my modest needs that's mostly fast enough.
I wasn't even aware of the SSE/AVX thingie. Is it some fancy vector calculation hardcoded in the silicon?
Mind the cache, you don't want that table to be larger than one kilobyte or so, only fits 512 16-bit values - thus, interpolation (and BSR trickery for square root). But even like that, a table would *shred* honest functions speed-wise if you need sin or cos.

SSE3 (which i had declared my minimum system requirements) provides sixteen 128-bit registers for vector floating point and integer calculations.
Its support dates back to single-core ancients, Athlon x64 (launched in 2003) and Pentium IV Prescott (launched in 2004). By 2007, the year I am aiming at hardware-wise, it was old and tried technology.

SSE can also act like MMX on steroids, operating on 16-bit and 8-bit numbers. The PMULHW command in particular, lets you multiply 16-bit signed integers like they are 8.8 fixed-point: it shifts the 32-bit result right by 16 only leaving the highest 16 bits. On a vector of 8 16-bit numbers stuffed into a 128-bit register.

AVX is very old stuff as well, my laptop dated 2012 has it (i5 2540m), extends the XMM registers to 256-bit YMM registers. Intel provides a code sample, I believe, that allows batch normalizing of vectors at a rate of one vector per tact.
AVX2 only adds more commands, as far as I know, while AVX512... Make a guess

Free Pascal, as far as I could tell, only supports AVX so far (it was long time since I last tested).

But if boggles the mind, how much raw muscle even a Core2 Duo has.