Page 5 of 5 FirstFirst ... 345
Results 41 to 48 of 48

Thread: Cheb's project will be here.

  1. #41
    I'm more and more tempted by the idea of 16-bit integer physics. 32768 is actually a lot, if you use it right. I have experience, after all - that game for MS-DOS used 16-bit physics.
    I also learned a lot since, the problem of velocity discretization at low speeds is easily circumvented by defining speed not per tic but per interval of N tics, where slow objects would move slowly, one jump per hundreds of tics (and just interpolated by any object interacting with them).
    SSE offer unique possibilities of speeding things up, PMULHW is tailor made for such things, multiplying 8 numbers per tact in the basic version and up to 32 in its AVX512 incarnation.
    Also, sines, cosines and reverse square roots -- all of these could be made using lookup tables with linear interpolation, maybe normalized using BSR - but anyway much faster than any floating-point counterparts.

  2. #42
    Quote Originally Posted by Chebmaster View Post
    I'm more and more tempted by the idea of 16-bit integer physics. 32768 is actually a lot, if you use it right. I have experience, after all - that game for MS-DOS used 16-bit physics.
    I also learned a lot since, the problem of velocity discretization at low speeds is easily circumvented by defining speed not per tic but per interval of N tics, where slow objects would move slowly, one jump per hundreds of tics (and just interpolated by any object interacting with them).
    SSE offer unique possibilities of speeding things up, PMULHW is tailor made for such things, multiplying 8 numbers per tact in the basic version and up to 32 in its AVX512 incarnation.
    Also, sines, cosines and reverse square roots -- all of these could be made using lookup tables with linear interpolation, maybe normalized using BSR - but anyway much faster than any floating-point counterparts.
    Integer physics, such an interesting idea . Haven't tried that but I hear it was common for early programmers. I on the other hand rely heavily on the square root for moving things and calculating positions. Haven't considered using a table. It's so easy to just use the square root command. For my modest needs that's mostly fast enough.

    I wasn't even aware of the SSE/AVX thingie. Is it some fancy vector calculation hardcoded in the silicon?
    It seems my oldest still bootable PC (J1900) lacks SSE but the never machines got SSE(4.2). No mention of AVX.

    For moving things slowly I too let them advance on the appropriate intervals. For what it's worth I once made a game on 16 bit Delphi where I also let the moving object become pale/fuzzy when moving really fast. Workes decently on my rather simple 2D games, I think.

    Thanks for updating us on your progress, though most of it is beyond me.

  3. #43
    Quote Originally Posted by Chebmaster View Post
    I'm more and more tempted by the idea of 16-bit integer physics. 32768 is actually a lot, if you use it right. I have experience, after all - that game for MS-DOS used 16-bit physics.
    Inf you go for Integer based algorithmic I recommend you stick to 32 bit Integers since most modern CPU's work either with 32 bit or 64 bit registers when doing math. This way you avoid converting from 16 to 32 bit and back all the time. Not to mention that integer overflow Flags will work as they should. Not sure if they would work on modern CPU's when using 16 bit integers unless you mess with FPU parameters which could lead to host of other problems since on modern computers no application gets exclusive access to specific core. Therefore changing FPU parameters might affect other applications.

    Any way many games actually rely on Integer based physics. Some even using 64 bit integers to achieve high enough precision. And there are whole libraries for doing Integer based math that you can find on internet.

    Another big advantage of using integer math is that if you are serializing and deserialzing your data to to some text based data structures like XML, JSON you can be sure that the value that is stored in such data structure is teh same that was stored in memory.
    When working with floating points this can not be guaranteed since you are changing from floating point to decimal system first. And not every floating point value can be converted into exact decimal system value or vice versa.

  4. #44
    Quote Originally Posted by SilverWarior View Post
    Inf you go for Integer based algorithmic I recommend you stick to 32 bit Integers since most modern CPU's work either with 32 bit or 64 bit registers when doing math. This way you avoid converting from 16 to 32 bit and back all the time[...]
    As far as my research shows, modern CPUs are well equipped to operate on 16 and 8 bit integers natively, no need to touch FPU. Expanding and shrinking is also very well supported in hardware. (Clear 32-bit registers, load 16-bit values into their lower parts, multiply as 32-bit, shift right by 16 bits, store result from the 16-bit lower part).
    x86-64, for example, allows addressing registers like r8w..r15w , meaning 16-bit parts of its extra 64-bit registers r8..r15 - not to mention the old trusty ax, bx, cx, dx, si and di of x86 aren't going anywhere.

    Serialization is not a problem: I use a library I made back in 2006..2008 that serializes into a binary format (achieved 1 million instances per second back then, on a Dual Core slowed to 1 GHz via lowering multiplier in BIOS, with DDR2-400 RAM)
    Normal calculations aren't a problem either: Free Pascal is all about reproducibility, as my tests proved. It's vector normalization and trigonometric things like sincos that slow everything horribly.
    Not even physics, but animation - which I want to be part of physics. Which operates on lots and lots of bones.
    Also the fact my skinning is going to be calculated on CPU only, for ease of coding and better compatibility. 16-bit numbers are twice as fast in memory bound bottlenecks (too easy to hit)

    Quote Originally Posted by Jonax View Post
    Integer physics, such an interesting idea . Haven't tried that but I hear it was common for early programmers. I on the other hand rely heavily on the square root for moving things and calculating positions. Haven't considered using a table. It's so easy to just use the square root command. For my modest needs that's mostly fast enough.
    I wasn't even aware of the SSE/AVX thingie. Is it some fancy vector calculation hardcoded in the silicon?
    Mind the cache, you don't want that table to be larger than one kilobyte or so, only fits 512 16-bit values - thus, interpolation (and BSR trickery for square root). But even like that, a table would *shred* honest functions speed-wise if you need sin or cos.

    SSE3 (which i had declared my minimum system requirements) provides sixteen 128-bit registers for vector floating point and integer calculations.
    Its support dates back to single-core ancients, Athlon x64 (launched in 2003) and Pentium IV Prescott (launched in 2004). By 2007, the year I am aiming at hardware-wise, it was old and tried technology.

    SSE can also act like MMX on steroids, operating on 16-bit and 8-bit numbers. The PMULHW command in particular, lets you multiply 16-bit signed integers like they are 8.8 fixed-point: it shifts the 32-bit result right by 16 only leaving the highest 16 bits. On a vector of 8 16-bit numbers stuffed into a 128-bit register.

    AVX is very old stuff as well, my laptop dated 2012 has it (i5 2540m), extends the XMM registers to 256-bit YMM registers. Intel provides a code sample, I believe, that allows batch normalizing of vectors at a rate of one vector per tact.
    AVX2 only adds more commands, as far as I know, while AVX512... Make a guess

    Free Pascal, as far as I could tell, only supports AVX so far (it was long time since I last tested).

    But if boggles the mind, how much raw muscle even a Core2 Duo has.

  5. #45
    Correction: it seems i mixed things up. SSE has 8 registers, not 16 - that is a latter extension. 8 is still a lot.

  6. #46
    After a BIG sidetrack to finishing up my favorite author's DooM mod without his permission (beware of a rabid fanboy and all that) -- a work that took literally months, since June to September,
    I am beginning curve back to my own projects.
    Remembering what was I doing took some effort, even switching back to Pascal from the horrible twisted hacks of ACS and DECORATE scripting.

    I have finished (in theory, mind you: my code still does not compile) several "required secondary powers" without which the basics of layered architecture were not possible.
    Did you know that <= and >= operators require their own versions when you do operator overloading? Not a surprise, really, after thinking about it.

    Then, after those parts of the foundation were maybe-ready, it was time to think with pen and paper in hand.
    Because my habit of keeping everything in my head would have made it explode in this case.

    Now, with this sketch in hand, I can plan finer details and class relationships.
    There are several consequences:

    1. There should be support for several completely self-contained worlds. In practice, two: the lobby and the map. So that the lobby does not reset on map change, for example. But I can also make each team's dressing room a separate universe -- OR make the lobby itself the dressing rooms, separated by a glass wall for taunting.
    2. UI cannot just affect the worlds willy-nilly, it must be linked to an agent -- let's call it "player character". Even if it is a dumb spectator camera, or even a placeholder waiting for the player to teleport from the lobby to the map. All effects are called "player inputs" and must go through the multiplayer manager, which gathers all inputs from all players to drive a 100% reproducible layered world. All such inputs must be as lightweight as possible since they are serialized and transmitted over network.
    3. When UI reads the world, it reads the "Present surface" layer, which is the cherry on top of the lag compensation. So UI must be prepared for radical changes to the world and the player character between frames, since full-world lag compensation could introduce or cancel results of, say, player nuking half the map.
    4. UI cannot reliably link (for monitoring it) to any object except the local player's character and objects created in the base layer. Because, due to lag compensation, objects created in intermediate layers are transient and are replaced by "the same" object from the lower layer when that layer bubbles up -- actually, a separate object unrelated to the previous one.
    Attached Images Attached Images

  7. #47
    Thinking with pen and paper is a good thing. I try that too often. Though my doodles are not as well organized as yours

  8. #48
    Well, some things are hellishly hard to develop without drawing signal diagrams on a millimeter grid paper -- namely, thread syncing algorithms. Luckily for me, I still have that *huge* roll of it, probably ten meters or so by one meter. It is quite yellowed and prone to cracking, tho... How long ago did I buy that thing? Or was it bought by my *parents* for my school activity in the 1980s...?

    I wasn't going to necro this thread for the time being (until I have at least *some* progress to show), but since you did that anyway, I can say this:
    I am not going to achieve progress in the near future: work projects at work demand my attention.
    All the free time I had the last year went into this: https://www.doomworld.com/forum/topi...-day-beta-001/
    On the plus side, I've finalized the architecture of my engine, the only thing left is coding and coding and coding it into reality.

    I have nearly created another Lovecraftian monstrosity in this "carousel of per-tic graveyards". Luckily I ate vitamins, came to my senses and realized that it is enough to have per-layer graveyards and make any layer use the *upper* layer's graveyard.
    The *sole* reason to keep instances after the current tic's end is the links to those instances that could be leading from the upper layer via accelerated fields (another invention of mine, managed by the memory manager). But, when the upper layer floats to the very top and vanishes, that's when those stop being a concern. So, the natural decision was to just use that layer's graveyard. When layer vanishes, all the instances vanish -- without calling destructors or even cleaning the graveyard, because I upgraded my memory manager for using different memory pools for different layers. Erasing a layer is as simple as dropping the pools associated with it.

    My second big decision was making the architecture a bit more complex, which will include some painful juggling and having several "worlds" in the base layer, each of which could be at a different tic and could be downloaded from the server independently. My main reason was a dream of having a lobby (with the chat being its part) that does not disappear on map transition.
    It is an *old* annoyance when you want to type something profound like "Lol ez wusses" but the map changes and your words of wisdom are lost.
    So if the lobby (thing TF2's dressing room) and the actual map are two universes inside the same server the client connects to independently, then, since the lobby is featherweight and could be connected to nigh instantly:
    1. You start choosing your class and cosmetics, seeing other players and the in-game chat/voice communication right away, before the engine finishes the "download the snapshot + fast-forward the snapshot to the actual tic" combo on the main map.
    2. Reconnecting after losing connection would feel much less frustrating if you could spend most of that time in the lobby, aware what is happening.
    Last edited by Chebmaster; 29-01-2024 at 03:43 PM.

Page 5 of 5 FirstFirst ... 345

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •