Cheb's project will be here.

**Chebmaster** · 05-05-2023, 09:57 AM

I, definitely, want to push Free Pascal to its limits and achieve the impossible.

Here's the determinism check as a standalone project
(note you need to make sure your browser doesn't correct http into https since I still haven't corrected my server's Let'sEncrypt and the https has invalid sertificate)
pure source http://chentrah.chebmaster.com/downloads/determchk.zip (7Kb)
with binaries compiled for x86 and x86-64 using both Free Pascal 3.2.2 and Free Pascal 2.6.4 : http://chentrah.chebmaster.com/downl...thbinaries.zip (199Kb)

As you can see, the lion's share of processing time goes to calculating those md5 sums.

A reminder: determinism is required for my planned multiplayer code to work at all. If the checksums do not match between platforms, those platforms wouldn't be able to play together and you'd need a separate server for each of them.

My friend who is working in in the game industry full time, had to deal with lack of determinism in Unity. Namely, you cannot count on monsters behaving identically if present with identical player actions. He had to improvise, adding a distributed server of sorts where each of the clients in a multiplayer game acted as a server for a fraction of monsters and just broadcast the behavior of those monsters to all other clients.

Full determinism, on the other hand, allows sending *only* the player inputs over the network. This is MMO-grade stuff: no matter how many monsters are there (even a million) or how massive the changes to the game world (i want the ability to reduce the whole map to a huge crater) the network traffic would remain zilch.

**SilverWarior** · 05-05-2023, 01:50 PM

Have you perhaps considered using some other Hashing algorithm instead of MD5. CRC32 hashing algorithm is way faster but might result in more clashes where different input results in same hash result. On the other hand many modern CPU's have hardware support for SHA based hashing algorithms which could mean that they would be much faster than MD5 which if my memory serves me correctly is rarely hardware accelerated.

Any way there is a good thread on Stack Overflow about comparison between various hashing algorithms. https://stackoverflow.com/questions/...st-performance
Granted question poster was interested in performance difference in .NET environment but some people that provided answered have done their own testing in other programming languages even Delphi.

**Chebmaster** · 08-05-2023, 09:29 AM

I just grabbed the one that was easiest to slap on and had a reasonably sized hash.
Since this code is not going to be part of normal execution but only be used for research during development (or, maybe, as an optional "check your CPU for compatibility" feature).

**Chebmaster** · 19-05-2023, 03:34 PM

I'm more and more tempted by the idea of 16-bit integer physics. 32768 is actually a lot, if you use it right. I have experience, after all - that game for MS-DOS used 16-bit physics.
I also learned a lot since, the problem of velocity discretization at low speeds is easily circumvented by defining speed not per tic but per interval of N tics, where slow objects would move slowly, one jump per hundreds of tics (and just interpolated by any object interacting with them).
SSE offer unique possibilities of speeding things up, PMULHW is tailor made for such things, multiplying 8 numbers per tact in the basic version and up to 32 in its AVX512 incarnation.
Also, sines, cosines and reverse square roots -- all of these could be made using lookup tables with linear interpolation, maybe normalized using BSR - but anyway much faster than any floating-point counterparts.

**Jonax** · 20-05-2023, 06:59 AM

Originally Posted by Chebmaster

I'm more and more tempted by the idea of 16-bit integer physics. 32768 is actually a lot, if you use it right. I have experience, after all - that game for MS-DOS used 16-bit physics.
I also learned a lot since, the problem of velocity discretization at low speeds is easily circumvented by defining speed not per tic but per interval of N tics, where slow objects would move slowly, one jump per hundreds of tics (and just interpolated by any object interacting with them).
SSE offer unique possibilities of speeding things up, PMULHW is tailor made for such things, multiplying 8 numbers per tact in the basic version and up to 32 in its AVX512 incarnation.
Also, sines, cosines and reverse square roots -- all of these could be made using lookup tables with linear interpolation, maybe normalized using BSR - but anyway much faster than any floating-point counterparts.

Integer physics, such an interesting idea

. Haven't tried that but I hear it was common for early programmers. I on the other hand rely heavily on the square root for moving things and calculating positions. Haven't considered using a table. It's so easy to just use the square root command. For my modest needs that's mostly fast enough.

I wasn't even aware of the SSE/AVX thingie. Is it some fancy vector calculation hardcoded in the silicon?
It seems my oldest still bootable PC (J1900) lacks SSE but the never machines got SSE(4.2). No mention of AVX.

For moving things slowly I too let them advance on the appropriate intervals. For what it's worth I once made a game on 16 bit Delphi where I also let the moving object become pale/fuzzy when moving really fast. Workes decently on my rather simple 2D games, I think.

Thanks for updating us on your progress, though most of it is beyond me.

**SilverWarior** · 20-05-2023, 01:03 PM

Originally Posted by Chebmaster

I'm more and more tempted by the idea of 16-bit integer physics. 32768 is actually a lot, if you use it right. I have experience, after all - that game for MS-DOS used 16-bit physics.

Inf you go for Integer based algorithmic I recommend you stick to 32 bit Integers since most modern CPU's work either with 32 bit or 64 bit registers when doing math. This way you avoid converting from 16 to 32 bit and back all the time. Not to mention that integer overflow Flags will work as they should. Not sure if they would work on modern CPU's when using 16 bit integers unless you mess with FPU parameters which could lead to host of other problems since on modern computers no application gets exclusive access to specific core. Therefore changing FPU parameters might affect other applications.

Any way many games actually rely on Integer based physics. Some even using 64 bit integers to achieve high enough precision. And there are whole libraries for doing Integer based math that you can find on internet.

Another big advantage of using integer math is that if you are serializing and deserialzing your data to to some text based data structures like XML, JSON you can be sure that the value that is stored in such data structure is teh same that was stored in memory.
When working with floating points this can not be guaranteed since you are changing from floating point to decimal system first. And not every floating point value can be converted into exact decimal system value or vice versa.

**Chebmaster** · 21-05-2023, 02:02 PM

Originally Posted by SilverWarior

Inf you go for Integer based algorithmic I recommend you stick to 32 bit Integers since most modern CPU's work either with 32 bit or 64 bit registers when doing math. This way you avoid converting from 16 to 32 bit and back all the time[...]

As far as my research shows, modern CPUs are well equipped to operate on 16 and 8 bit integers natively, no need to touch FPU. Expanding and shrinking is also very well supported in hardware. (Clear 32-bit registers, load 16-bit values into their lower parts, multiply as 32-bit, shift right by 16 bits, store result from the 16-bit lower part).
x86-64, for example, allows addressing registers like r8w..r15w , meaning 16-bit parts of its extra 64-bit registers r8..r15 - not to mention the old trusty ax, bx, cx, dx, si and di of x86 aren't going anywhere.

Serialization is not a problem: I use a library I made back in 2006..2008 that serializes into a binary format (achieved 1 million instances per second back then, on a Dual Core slowed to 1 GHz via lowering multiplier in BIOS, with DDR2-400 RAM)
Normal calculations aren't a problem either: Free Pascal is all about reproducibility, as my tests proved. It's vector normalization and trigonometric things like sincos that slow everything horribly.
Not even physics, but animation - which I want to be part of physics. Which operates on lots and lots of bones.
Also the fact my skinning is going to be calculated on CPU only, for ease of coding and better compatibility. 16-bit numbers are twice as fast in memory bound bottlenecks (too easy to hit)

Originally Posted by Jonax

Integer physics, such an interesting idea

. Haven't tried that but I hear it was common for early programmers. I on the other hand rely heavily on the square root for moving things and calculating positions. Haven't considered using a table. It's so easy to just use the square root command. For my modest needs that's mostly fast enough.
I wasn't even aware of the SSE/AVX thingie. Is it some fancy vector calculation hardcoded in the silicon?

Mind the cache, you don't want that table to be larger than one kilobyte or so, only fits 512 16-bit values - thus, interpolation (and BSR trickery for square root). But even like that, a table would *shred* honest functions speed-wise if you need sin or cos.

SSE3 (which i had declared my minimum system requirements) provides sixteen 128-bit registers for vector floating point and integer calculations.
Its support dates back to single-core ancients, Athlon x64 (launched in 2003) and Pentium IV Prescott (launched in 2004). By 2007, the year I am aiming at hardware-wise, it was old and tried technology.

SSE can also act like MMX on steroids, operating on 16-bit and 8-bit numbers. The PMULHW command in particular, lets you multiply 16-bit signed integers like they are 8.8 fixed-point: it shifts the 32-bit result right by 16 only leaving the highest 16 bits. On a vector of 8 16-bit numbers stuffed into a 128-bit register.

AVX is very old stuff as well, my laptop dated 2012 has it (i5 2540m), extends the XMM registers to 256-bit YMM registers. Intel provides a code sample, I believe, that allows batch normalizing of vectors at a rate of one vector per tact.
AVX2 only adds more commands, as far as I know, while AVX512... Make a guess

Free Pascal, as far as I could tell, only supports AVX so far (it was long time since I last tested).

But if boggles the mind, how much raw muscle even a Core2 Duo has.

**Chebmaster** · 21-05-2023, 03:32 PM

Correction: it seems i mixed things up. SSE has 8 registers, not 16 - that is a latter extension. 8 is still a lot.

Thread: Cheb's project will be here.

Thread Tools

Display

Hybrid View

Bookmarks

Bookmarks

Posting Permissions