Cheb's project will be here.

**Chebmaster** · 19-02-2023, 12:48 PM

I am currently amidst an immense re-haul that changes the very architecture. Hopefully by the end of this year (2023) it would be over and I could move on with creating my first game.

Previously:
My "killer feature", as envisioned back in 2005, was what has since shrunk to "developer mode": all relevant code resides in a DLL that could be re-compiled and re-loaded without re-starting the engine and re-loading assets.

I invested about 4 years total into my database engine 2006-2008 and the asset management (2012-2013) that linked game code with assets stored in the "mother executable".

The common parts of architecture that will remain as is:
- the mother executable has an API - a monolithic record of fields and functions in procedural variables serving the game code's gateway into the engine. Includes configs, window manager, and a dual-layer wrapper allowing the DLL using mother executable's streams as a TStream of its own.
- the database works on "save a snapshot to TStream" principle, with perfect reproduction of the logic on load.
- assets are identified by an unique hash (was 256 bits, reduced to 12

, either randomly generated or a md5 of file name.

The old architecture:
the DLL had a logic thread for the object database (single-threaded by design) with assets being classes of it. The DLL managed background tasks, the logic classes had access to graphical API (OpenGL/GL ES) and had methods for rendering in the main thread. Locking was employed to prevent database from crashing while the render routine was executing an a thread not of its own. On unloading, all assets were counted and packed into a separate mother stream, which for the mother executable was a banal TMemoryStream. On loading, logic had to retrieve that list (which could be empty, if it was the first start, or containing mismatching assets from a different run, in case of switching the session). Each asset object then had to employ convoluted algorithm of devouring its stored counterpart, absorbing properties and OpenGL handles or discarding them. Which, in case of hierarchical multi-part assets like FBOs, was turning into a nightmare.

It's no surprise that me development stalled and my phtagn asset manager was plagued by bugs very hard to catch (as everything was split into inter-dependent tasks running in background threads).

The new architecture (I'm cutting and cutting it down):
- it's not 2005 anymore, I am developing from a SSD.
- no more "universal" mother that can run any of the games/tools. There is one mother executable per each game/tool (one release, one debug with assertions on).
- the DLL is only used in the "developer mode" only available for x86 Win32. The normal mode of operation, and all other platforms, is logic built in into the main executable. No more agony of building DLLs for Linux.
- the DLL runs in the logic thread created by the mother and that's all. The DLL never uses any other threads.
- my new rendering architecture Kakan: the logic fills a command list in its logic thread, abstracted from any APIs, then passes it for execution and forgets it. The rendering in the main thread is done by Kakan. The logic loses all access to OpenGL.
- assets are mother executable's classes, accessible to the logic as untyped pointers. Any specific details are exposed via pointers to T<XXX>Innards records shared between the mother and the DLL. Mother API's ExposeInnards method returns an untyped pointer. The logic's job is to type-cast it to correct P<XXX>Innards. Ugly, but I saw no other way to make it simple enough.
- logic has its own classes for linking to assets, derived from TAbstractAssetLink. All begin with a pair (pointer + hash), where pointer (Mother's asset class instance) is never saved with the snapshot, always nil after de-serialization, and hash duplicates mother asset class's hash.
- mother manages assets *and* their lifetime, organized in a specialized fcl-stl map addressed by hashes. Assets are reference-counted, all refcounts are reset to zero after the logic unloads.
- most assets' actualization is handled by the mother in the render phase, employing background tasks if necessary.
- mother owns background threads and can run background tasks, including cpu-side animation.
- the loading screen with its fancy progress indicator was dropped in its entirety. The logic remains frozen until first successful render but keeps sending render jobs. The render jobs fail with un-actualized assets, causing some assets to actualize each frame, and replacing themselves with a console render job. So "loading screen" is the console with, maybe, a low-res background image.
- the error recovery screens were dropped, application displays console with BSOD background and "Press Esc or Back to exit".
- Kakan manages jobs opaquely to the logic. It sorts jobs by render targets automatically, calculating their order based on where that texture is a texture and where it is a target.
- Depth/stencil are managed by Kakan opaquely, targets could only be textures. Reason: targeting Mali 400 as the minimum, so depth/stencil do not actually exist and cannot be reused with another color attachment. Need a depth pass? Stuff its output into a RGBA8. Preferably in 128x72 resolution.

The design document for my first planned game has no English translation yet, also my websites are down due to unsuccessful hardware upgrade (the venerated SATA controller dated 2006 finally gave up the ghost, sees my Samsung HD204UI drives as "ASSMNU GDH02U4 I" glitches with random-generated capacity)

P.S. See this nightmare:
SNIPERS: A Nightmare for Developers and Players https://www.youtube.com/watch?v=lOebGm_jMLY
- and that's why my planned game does not have hitscal weapons at all.
"Sniper" will be one of ninja's load-outs, heavily influenced by the TF2 "Lucksman" (sniper's bow that fires arrow projectiles).

P.P.S. When playing competitive first person shooters, no one wants "serious". What people want is slapstick rumble. So any foolish developers who try "serious" style soon give up under players' pressure, their artsy black ops noir degenerating into slapstick comedy. Compare to the wisdom of Valve who made TF2 slapstick from the start (and also reaped immense profit on cosmetics and taunts).
So, the further away from a mil-sim, the better. More. More distance. Make spells, not weapons. Use in-universe reason for player avatars being something like shadow clones, so that they dispel or unravel with zero blood.

P.P.P.S. My solution to the problem highlighted in the video above: make the snooper rifle shoot on release, like bows in Mount & Blade. Like a mini-game. The need to lead your vic-- ahem, target is already there. Combine that with firing in the appropriate time window... Otherwise suffering outrageous penalties to accuracy. So that a zero-time instant shot goes wide most of the time and holding LMB for too long adds increasing sway.

**SilverWarior** · 20-02-2023, 06:44 PM

Originally Posted by Chebmaster

P.P.P.S. My solution to the problem highlighted in the video above: make the snooper rifle shoot on release, like bows in Mount & Blade. Like a mini-game. The need to lead your vic-- ahem, target is already there. Combine that with firing in the appropriate time window... Otherwise suffering outrageous penalties to accuracy. So that a zero-time instant shot goes wide most of the time and holding LMB for too long adds increasing sway.

Not a bad video but it fails to identify where games fail to simulate rifle guns entirely. The biggest reason why rifles are not good for close quarters fighting in real life is their long barrel. Why is that?
Well that barrel has some weight. And since you are holding that barrel away from your body that acts as center of rotation it has quite a lot of inertia meaning that you need to apply quite significant force to start turning the barrel toward your target and then also an equal amount of power to stop it turning toward your target at the right time so you don't turn to much and thus go past the target. And unlike in a game where you can quickly move the mouse to quickly turn yourself for a large angle and then almost immediately stop pointing at your target you would never be able to do so in real life. Not unless you are a super strong robot.
Bare in mind that it isn't just turning left and right where you are fighting against barrel inertia it is also up and down. That is why when you look at some special forces or military personnel they always walk a bit strange by having their knees bet all the time when having their weapon raised. That is needed because during normal walk people usually sway left and right to some degree and move up and down a bit when going from left to right foot and vice versa. So during normal walk you would be constantly fighting against barrel inertia.

Another problem of long barrel is that moving with it in tight spaces is quite cumbersome. Why? Because you now have one meter long "stick" (size of my bb rifle from its shoulder support to end of the barrel) or longer sticking out from you. So you can no longer move as close to the wall without hitting the wall with the barrel.
Do you want to get better idea of how cumbersome this can be but you have no actual rifle in your house? Take a broomstick put the broom brush against your shoulder as if you are holding a rifle and go walk around your house. Broom size is pretty similar to the size of a sniper rifle.
NOTE: I'm not taking responsibility for any damage you might cause in your house during this experiment

So one way of solving the problem of sniper rifle being so overpowered is making sure that bigger movement you have made more time it takes for your aim to become steady as it is in real life. And that is going to make huge difference.

But of course there is another problem. And that is simplified hit detection. In most games you are basically detecting just body shot (causes same damage whether it is in the chest or a and finger) and head shot (instant kill in most games). But in real life this is much more complex. For instance if you get shot into vital organ you are pretty much goner even if it is just from a small pistol. But you can actually get shot with a sniper rifle into non vital part of your body and live even if sniper file would have just went through you.
So why do games treat sniper rifles as one shot kills then. That is because throughout history most snipers were also expert marksman who knew which body part to hit in order to be effective. So statistically their shot to kill ratio was very high. But it wasn't as much due to sniper rifle but due to their expert marksmanship.

**Chebmaster** · 21-02-2023, 02:55 PM

That broom is quite enlightening.
Yes, using long rifles and swinging zweihanders in tight passages... That's arcade, not sim. Who *ever* implements inability to turn around because your shillelagh is longer than the corridor's width...
AFAIR only Tribes: Vengeance even had a mechanic that visibly moved your gun back if you faced a wall (and also called their rocket launcher "spinfusor" which is seriously badass).

At the very least, firearms could be balanced along movement vs accuracy axis. If you are on the move or change your aim rapidly, you get atrocious random spread (which shotguns and smgs partly negate by having their own spread). If you want an accurate shot you switch to aiming stance, either by stopping and reducing your mouse movements, or by pressing a dedicated button that hampers your movement and zooms.
If a game doesn't have that, it's an arcade and should look long and hard at the Quake series.

Hmm... Maybe i should review my concept. Not forcing movement penalty while spell is being charged, but inflicting large random sway instead (& hiding the crosshair). Then "Charging your enemy while charging your shot" becomes a valid strategy. Also, directing homing projectiles while sprinting (the controllable fireball from Dark Messiah of Might and Magic is my shining ideal).

Loadout opportunities arise.
If the spell that serves the role of shotgun (120% damage total scattered in a wide cone) could be pre-charged to fire instantly on release while sprinting, and its alt-fire works like Q3 nailgun (a long-range with very little spatial but large velocity spread) penalized with a loud sound and standing still...
If the spell that mimics Q3 plasma, at the same time, has a sizable firing delay and no way to pre-charge it because its alt fire consists of controllable single shots for long-range harassing instead...
That gives depth to the rock-paper-scissors interplay between those two.

Pair with more class-specific spells, like a controllable fireball that has hefty mana cost, and you get seriously fun gameplay with very few actual "weapons".

**SilverWarior** · 21-02-2023, 04:26 PM

Originally Posted by Chebmaster

AFAIR only Tribes: Vengeance even had a mechanic that visibly moved your gun back if you faced a wall

Actually there are several games that have this mechanic. If my memory serves me correctly both Crysis and Far Cry 3 have this mechanic.

**Jonax** · 21-02-2023, 07:37 PM

Actually I never play that type of games anymore. Last game where I was running around shooting uglies was the great adventure game 'Legacy' from I think 1993. It ran well on brave 1 MB RAM and 386 processor and VGA monitor. In fact I still got the game on Dosbox. Though I don't think there were any sniper rifles in that game.

Point is I can say as often. I'm happy to see activity in the Pascal crowd but I can't really comment much on the current topic. Sniper rifle and its properties.

**Chebmaster** · 22-03-2023, 10:35 AM

Still *deep* in rehauling the very foundations.

Who could have thought that browsing Wikipedia about supercontinental cycles could give you ideas!

My former Logic, bloated to unsustainability and stifled by being the root managed object of the graph, split apart like Pangaea -- and things are becoming so, so much simpler!
Each of the resulting entities is quite manageable, I am in process of stuffing them full of methods scavenged from my old TAbstractLogic and organizing their interactions.
Also, the root managed object of the graph that goes into a sav, is now a transient thing, created just before serialization and disposed of after deserialization. Thus decoupling save file structure from the actual data structure.

I would never get anywhere with layered lag compensation had i not made this split.

Will also help me nicely to separate GUI (a local client entity, not existent on a dedicated server) from the game world.
I am positive I could present a lag-compensated multi-player rotating cube this autumn.

About first person shooters: with the exception of occasional delves into Brutal Doom, I prefer team multiplayer games of a run-and-gun variety. Namely, Jagex Ace of Spades (before it went down) and TF2. Unlike the mindless npc slaying of single-player shooters, those are tactical struggles against fellow humans, your equals in cunning, and working with your team to achieve set goals (usually capturing/holding control points, capture the flag or defense against the other team dragging a bomb towards your base).

When I finally release my design document for my planned game, you will see it's basically an AoS clone with ideas borrowed from TF2 and some of my own.
When I initially laid foundations for my engine, I wanted to make a 4X game -- maybe that, too, in time. Too ambitious, just like me struggling for years trying to one-up Unreal Engine instead of making a game.

TL; DR: snipers are anti-thesis to run-and-gun. Like in Open Arena: you have a fun rocket duel, then comes some killjoy with a railgun. Not on my watch. All my planned weapons are projectile-based.

**Chebmaster** · 24-04-2023, 10:56 AM

Google translate, I call upon you to let me bridge the language gap for free!
(from https://freepascal-ru.translate.goog..._x_tr_sch=http )

(my reply to discussion about reproducibility and how to achieve it)

Re: Cheb's Game Engine

Message Cheb » 02.03.2023 15:10:10
The trick is to:
a) strictly 32-bit floats.
b) you wrap *any* constant in the code in a typecast to a float. Any. Anytime and anywhere. a:= b * Single(2.0); Otherwise, Pascal tries to calculating in as wide format as possible and does it in a platform-dependent way: doubles, extendeds, black magic ...

Added after 3 hours 54 minutes 43 seconds:
PS. I do not take anything for granted, I experiment, I have a built-in tester in the engine that calculates md5 over the entire 32-bit range (4 billion in total).
Damn, that's when it's inconvenient that the engine is not going to at all.
AFAIR, I compared x86, x86-64 and arm from raspberries - and everywhere the sine converged to a bit.

Added after 1 minute 16 seconds:
P.P.S. BUT! then I collected in 2.6.4 for x86-64 and, AFAIR, 2.6.4 also for arm.

Added after 5 hours 37 minutes 26 seconds:
P.P.P.S. I started a separate test program consisting of a single source file, ripped from the engine - but when would it be ready I really dunno, there is no time at all, a lot of things from all sides.

User avatar
Cheb
enthusiast

Messages: 985
Registered: 06/06/2005 15:54:34

to come back to the beginning
Re: Cheb's Game Engine

Message Cheb » 04.03.2023 15:44:36
Oh, how many wonderful discoveries we have! :shock: :x :evil:

(note: if you looked at the indicator of your processor in the Intel Burn Test / Lintel and dreamed - prepare for dashed expectations. On a processor with a limit of 20 gigaflops, the Pascal program will give out around 0.8. Because there are spherical cows coded in the most exalted AVX by special people - and then there are one-at-a-time calculations with guaranteed bitwise reproducibility)

1. Frac () is a monstrously slow function. Lowest of the low at the Sin() level. If you were hoping to make an accelerated fake sine like

Code: Select all

Code:

      function ebd_sin(a: float): float; inline;
      begin
        a:= frac(a * float(0.318309886183790671537767526745031));// 1 / 3.141592653589793));
        a:= (float(1.0) - a) * a;
        Result:= float (129600.0) * a / (float(40500.0) - a);
      end;

- forget it, it will wallow in the same ditch with the sine and they will be oinking head to head (sin() 0.04 gigaflops, ebd_sin() 0.05).
Which is 13 times slower than multiplication and one and a half times slower than 1/sqrt(x).

2. In 64-bit code, some things are much slower, and some things are much faster - but the reproducibility is ideal. Checksums always match those from the 32-bit code. In order to get a mismatch, you need to climb into the assemblly language and stick your fingers in the electric socket of RSQRTPS (quick and dirty inverse square root). That one - yes, that one will have a different checksum on each CPU model, not just compile target.

AFAIR, on the Cortex A7, the checksums were exactly the same - although it would seem. I can't check right now, all my raspberries and oranges are gathering dust on the shelf. And even more so, I can’t check arm 64: I simply don’t have such. I bought an orange last year - I even was wondering why was it so cheap. It turned out that inside there is the same Cortex A7 in an embrace with Mali 400. That is: Orange Pi PC is a Chinese analogue of Raspberry Pi 2B, not higher. And it's still is being sold!

Anyway, on x86-64 (compared to x86):
- Frac() got exactly three times faster, making ebd_sin() outperform Sin() by 3.4 times - because that function slowed *even more*, down to 0.035 gigaflops. Do they have a special competition or wut?
- multiplication by a constant not wrapped in a typecast to float slowed down by 2.78 times compared to wrapped one. Moreover, the checksums of that of the other option match with their counterparts from the 32-bit code (and they are different from each other).

More details (including the test source) - when I fix my server and there will be somewhere to post it.

Added after 21 hours 10 minutes 8 seconds:
Furthering the topic of speed: SQRTPS + DIVPS with 1.0s preloaded into the registers are *exactly* four times faster than the standard 1/ sqrt(x). Obviously, the compiler uses exactly the same instructions - only scalar, not vector. Doing four operations at a time accelerates calculations by exactly four times. I have RCPPS commented out there - obviously, the checksum did not match, bitwise it turned out differently than honest 1 / x through DIVPS.

But just look at RSQRTPS going at it! (four and a half times faster than the reproducible sse and eighteen times faster than the regular 1/ sqrt (x)) - and it becomes obvious that this is not a bad compiler, this is a processor getting lost in thought when you require bitwise conformance to standards.

..checking 1/sqrt(x)
..................................
ok, in 45 (pure 21.2) seconds (0.1 GFLOPS)
..md5 checksum = 7BA70F1439D5E2955151CC565477E924

..checking SSE SIMD4 1/sqrt(x)
...................... ...........
..ok, in 29 (pure 5.31) seconds (0.401 GFLOPS)
..md5 checksum = 7BA70F1439D5E2955151CC565477E924

..checking SSE SIMD4 RSQRTPS (packed quick reverse square root)
... ..............................
..ok, in 25 (pure 1.18 ) seconds (1.81 GFLOPS)
. .md5 checksum = F881C03FB2C6F5BBDFF57AE5532CFFFD

Let me remind you, this is on a CPU for which Lintel reports 20 gigaflops per core (and 30 for two, because both do not fit into TDP at full tilt making effectively a 1.5 core CPU).

Added after 3 minutes 45 seconds:

Code: Select all

Code:

              dck_one_div_sqrt: begin
                for m:= 0 to (mm div 8) - 1  do begin
                  pointer(pv):= p + m * 8 * sizeof(float);
                  pv[0]:= 1/sqrt(pv[0]);
                  pv[1]:= 1/sqrt(pv[1]);
                  pv[2]:= 1/sqrt(pv[2]);
                  pv[3]:= 1/sqrt(pv[3]);
                  pv[4]:= 1/sqrt(pv[4]);
                  pv[5]:= 1/sqrt(pv[5]);
                  pv[6]:= 1/sqrt(pv[6]);
                  pv[7]:= 1/sqrt(pv[7]);
                end;
              end;
            {$if defined(cpu386)}
              dck_sse_one_div_sqrt: begin
                for m:= 0 to (mm div 8) - 1  do begin
                  pointer(pv):= p + m * 8 * sizeof(float);
                  asm
                    mov eax, [fourones]
                    MOVAPS xmm5, [eax]
                    mov eax, [pv]
                    MOVAPS xmm6, [eax]
                    SQRTPS xmm6, xmm6
                    MOVAPS xmm4, xmm5
                    DIVPS xmm4, xmm6 //RCPPS   xmm6, xmm6 //Reciprocal Parallel Scalars or, simply speaking, 1.0/x
                    MOVAPS xmm7, [eax + 16]
                    SQRTPS xmm7, xmm7
                    MOVAPS [eax], xmm4
                    DIVPS xmm5, xmm7 //RCPSS xmm7, xmm7
                    MOVAPS [eax + 16], xmm5
                  end['eax', 'xmm6', 'xmm7', 'xmm4', 'xmm5'];
                end;
              end;
              dck_sse_rsqrtps: begin
                for m:= 0 to (mm div 8) - 1  do begin
                  pointer(pv):= p + m * 8 * sizeof(float);
                  asm
                    mov eax, [pv]
                    MOVAPS xmm6, [eax]
                    RSQRTPS xmm6, xmm6
                    MOVAPS xmm7, [eax + 16]
                    RSQRTPS xmm7, xmm7
                    MOVAPS [eax], xmm6
                    MOVAPS [eax + 16], xmm7
                  end['eax', 'xmm6', 'xmm7'];
                end;
              end;
            {$endif}

, where mm in most cases = 2048

User avatar
Cheb
enthusiast

Messages: 985
Registered: 06/06/2005 15:54:34

to come back to the beginning
Re: Cheb's Game Engine

Message Cheb » 10.03.2023 22:53:15
Updated requirements, cleaned definitions in the code from unnecessary variability

Reason: my minimums include Athlon 64 X2 (2005, alas, I don't have it) and Pentium E2140 (2007, computer named Gray Goose). Both of these dual-core processors are 64-bit (alas, WinXP has no usable 64-bit version) and support SSE3.
Then what the (insert expletive here) was I doing basing my code on SSE2 instead of SSE3?
From now on, any code for x86 and x86-64, in any assembler inserts, assumes that SSE3's availability is guaranteed.

I am not going to consider SSE4 and higher, because if the E2140 with its two 1.6 GHz cores has enough horse power, then any modern one would fly into orbit and there is simply no point in working myself hard about this. My good intentions towards AVX/AVX512 will likely remain intentions.
That's it, all done..

Further, for LinuxSBC I have those minimals: Cortex A7. It has VFPv4-16, and I declare the same in my code as the only supported option - if I ever get to assembler under arm.
All arrived.

TL; DR: Free Pascal is optimized for *reproducibility*, bitwise matching results on all platforms. It seems it sacrifices lots of performance to reach that goal.

Thread: Cheb's project will be here.

Thread Tools

Display

Hybrid View

Bookmarks

Bookmarks

Posting Permissions