Results 1 to 9 of 9

Thread: PasMP - a parallel-processing/multi-processing library for Object Pascal

  1. #1

    PasMP - a parallel-processing/multi-processing library for Object Pascal

    PasMP

    PasMP is a cross-platform parallel-processing/multi-processing library for Object Pascal for Delphi and FreePascal

    GitHub project page:

    https://github.com/BeRo1985/pasmp

    License:

    zlib

    Why:

    Because System.Threading and OmniThreadLibrary are Delphi-only libraries and MTProcs has also his own problems (FreePascal-only, inefficient lock-based work-stealing, etc.) and at least System.Threading from Delphi XE7 has also a bit more inefficient work-stealing implementation.

    Features:

    • Low-level-based design with optional high-level-based constructs
    • Designed for fully-strict fork-join model in mind (because it's less error-prone to work with it than with terminally-strict fork-join for my taste), but it can be also abused for more flexible models, the only important thing is, that you're releasing the jobs again as soon as they are completed (the simplest weay would be by calling TPasMP.Reset per workload-frame), otherwise you'll have memory leaks, or just use the PasMPJobFlagReleaseOnFinish flag.
    • It can be used a job scheduler for multithreaded game engines and so on
    • Work-first lock-free work-stealing dynamic-sized Chase-Lev queue/deque (where only the resizing code part of it isn't lock-free, so that on the other hand it is garbage-collector-free)
    • Lock-free job memory allocator (al least lock-free on x86-32 and x86-64 targets)
    • Parallel-for pattern
    • Parallel intro sort (direct and indirect)
    • Parellel merge sort (direct and indirect)
    • Optional strict singleton usage option per global PasMPUseAsStrictSingletonInstance define (besides the option of usage of multiple PasMP instances)
    • Compatible with FreePascal >= 2.6.x and Delphi >= 7
    • Cross platform (Windows, Linux, etc.)



  2. #2
    Interesting. I will have to give this a try one day.
    Currently my programming is halted due to some life issues I'm having.

  3. #3
    PGD Staff / News Reporter phibermon's Avatar
    Join Date
    Sep 2009
    Location
    England
    Posts
    508
    You always seem to produce exactly what I need when I need it! I've been wanting to use OmniThreadLibrary for ages for lock free queue implementations but the lack of FPC support makes it useless to me.

    BeRo you are an absolute genius! thank you so so *so* much for sharing your work - this and all your other high quality gifts - I will make good use of it and will contribute back to the community in kind.

    I'd give my left arm to work with you

    (and I'm left handed)
    Last edited by phibermon; 10-02-2016 at 01:22 AM.
    When the moon hits your eye like a big pizza pie - that's an extinction level impact event.

  4. #4
    PGDCE Developer de_jean_7777's Avatar
    Join Date
    Nov 2006
    Location
    Bosnia and Herzegovina (Herzegovina)
    Posts
    242
    This is awesome. Not that much of my work requires parallelism currently, but I will need it in the future. Very nice of you.
    In defeat we learn.

  5. #5
    that's awesome , every one who's looking for speed would definitely love this so much , my next wish is that fpc will generate better vectorized code in the future .
    btw , you really amazed me on how you get all that incredible energy to write all those great stuff kudos

  6. #6
    PasMP updated: https://github.com/BeRo1985/pasmp

    New features:

    • Thread-safe single producer single consumer queue (untyped and typed, bounded-only, lock-free)
    • Thread-safe multiple producer multiple consumer stack (untyped and typed, bounded and unbounded, lock-free on x86-32/x86-64/ARM32, lock-based on another CPU targets)
    • Thread-safe multiple producer multiple consumer queue (untyped and typed, bounded and unbounded, lock-free on x86-32/x86-64/ARM32, lock-based on another CPU targets)
    • Thread-safe hash table (untyped and typed, hybrid-implementation of lock-free and fine-graded lock-based single-operation code parts)
    • Thread-safe dynamic-sized array (untyped and typed, fine-graded lock-based)
    • TPasMPMath class with class static useful (primary bit-twiddling) math function methods
    • TPasMPInterlocked class with class static atomic function methods
    • TPasMPMemoryBarrier class with class static memory barrier function methods
    • TPasMPMemory class with class static aligned memory allocation function methods
    • Synchronisation primitives:
      • TPasMPEvent
      • TPasMPSimpleEvent
      • TPasMPCriticalSection
      • TPasMPMutex
      • TPasMPConditionVariableLock
      • TPasMPConditionVariable
      • TPasMPSemaphore
      • TPasMPInvertedSemaphore
      • TPasMPMultipleReaderSingleWriterLock
      • TPasMPMultipleReaderSingleWriterSpinLock
      • TPasMPSlimReaderWriterLock
      • TPasMPSpinLock
      • TPasMPBarrier
    • and more...


    And now, I'll port the Vulkan headers to pascal.

  7. #7
    Cool! This is exactly what I lacked in FPC!
    Thank you for sharing this!
    Are there any tests for the library?
    Or how did you know that your lock-free collections implementation are really thread safe and correct?

  8. #8
    Quote Originally Posted by Mirage View Post
    Cool! This is exactly what I lacked in FPC!
    Thank you for sharing this!
    Are there any tests for the library?
    Or how did you know that your lock-free collections implementation are really thread safe and correct?
    For many reasons:

    • I'm using the SPSC bounded queue approach since many years in my audio software stuff for as sound ringbuffer between two on-different-CPU-cores-pinned threads without any issues. And the SPSC boundung queue is really simple, because only one thread can be a producer and only one another thread can be a consumer. And the C++ boost::lockfree ringbuffer class seems using the same (or a very similar) SPSC bounded queue approach like I myself. Only the critical part is that the memory barriers are set correctly on the correct code positions, especially on CPU targets with weak memory models (for example ARM, PowerPC, etc.). x86 CPUs have comparatively a strong memory model. For more about weak vs. strong memory models, see: http://preshing.com/20120930/weak-vs...memory-models/
    • The lock-free MPMC queue stuff is based on the http://people.csail.mit.edu/edya/pub...ue-journal.pdf paper, which contains a "Correctness proof" text section.
    • The lock-free MPMC stack stuff is is based on the idea behind the concept of the internal workings of the "Interlocked Singly Linked Lists" Windows API, just stripped by the Depth stuff, which (the core concept behind it) is also well tested at the Microsoft Windows Operating System.
    • The lock-free job work stealing is based on http://www.di.ens.fr/~zappa/readings/ppopp13.pdf, which contains also a "Correctness proof" text section.


    And you have to consider the problem with the CPU cache lines for to avoid the false-sharing performance-degrade issues (https://en.wikipedia.org/wiki/False_sharing). Therefore, PasMP is more optimized for performance than for the memory usage, because almost everything in PasMP is CPU cache line aligned, also every item in your TPasMPDynamicArray, TPasMPBoundedStack, TPasMPBoundedQueue, etc. And a CPU cache line is on the most current x86 CPUs 64 bytes long.
    Last edited by BeRo; 21-02-2016 at 09:23 PM.

  9. #9
    Thanks for the answer.
    My question was more about implementation rather than about algorithms.
    When working with multithreading it's very easy to make a mistake and very hard to find it. Even tests don't guarantee anything. But them increasing probability to find errors.
    I think I'll write some test for your library when I'll integrate it to my projects.

Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •