Multithreading Efficiency [Archive] - Pascal Game Development

View Full Version : Multithreading Efficiency

code_glitch

26-10-2011, 10:21 AM

Revising the concept code of a new server application for the home server to run (yes you're reading that right, I'll explain later) I've stumbled across an efficiency problem that theoretically is void but I thought I would as anyway (after someones great signature said something like:
'In theory nothing works and everyone knows why, in practice everything works and no-one knows why, combine the two and nothing works and no-one knows why'
I remembered theory is usually the opposite of what happens)

The question: which is better for a hyperthreaded dual core system (atom, nothing fancy): lots of really little threads, or a few big threads? The same amount of work needs to be done, so the same amount of cycles - but which makes better use of 4 cores? 4 big threads or 6 little threads?

For those wanting to know what the server program is for, its a slaw of the original telnet/bbs systems; managing the tty, files, news (RSS) and logs from one place with the capability of having multiple clients connected. I know there is probably an app like that, but since I can't be bothered sudoing my way around config files and that I could be bothered to get some more networking practice with the Sockets unit in - off the project went :D

26-10-2011, 03:30 PM

This issue is rather subjective and depends both on the system and your application, specifically the code in each thread. Hyperthreaded Atoms are optimized for running many threads, but if you access a lot of data, you'll be trashing CPU's cache more often, which is quite small anyway.

In an article that we submitted to the conference few months back we've done performance tests in an application which was doing 3D scene illumination in software (we ran it both on CPU and GPU) - we've used 64 threads. Intel Atom N280 (1 physical core, 2 logical) and N570 (2 physical cores, 4 logical) both performed much better using threads as opposed to single thread: N570 ran the code 3.5x faster when using threads, while N280 ran it about 1.8x faster. With only 16 threads being used, the produced results were significantly worse and CPU usage did not reach 100%, while 128 threads gave results quite similar to 64 threads.

By the way, out of curiosity, the same code executed on iPad 2 also received benefit from using 64 threads: the code executed roughly 2 times faster (iPad 2 uses dual-core CPU).

It is common for thread's code to be waiting for many things like RAM accesses, FPU results and so on, so you can use multithreaded CPU more efficiently if you use more threads. In your case, since you are doing server application, I'd recommend at least 32 threads or more; 4 or 6 threads won't be using the CPU efficiently.

User137

26-10-2011, 04:05 PM

Hmm? Just 2 threads should use CPU efficiently on a dualcore processor, if both of them have equal amount of work to do. However i would propably try to use many more than 6 threads if i had 4 cores.

If you have 1 thread which divides workload, you propably don't want to wait for one of the existing "thread slots" to finish, but instead create a new thread that moment. That is dynamic and free amount of threads.

As far as gpu goes, you can only use 1 cpu core to do all the rendering.

code_glitch

26-10-2011, 04:12 PM

Its just that my approach may result in a maximum of 65536 threads... A very effective DOS attack could then be carried out. Basically, my logic was that if more threads = better performance then just dedicate 1 thread to each client ;) There should be no more than a handful of clients (around 5/6) at any one time and thus no more than 5/6 threads + the initial thread...

However, attempting to connect to every port could result in a LOT (60K+) threads which I believe could cause trouble for the ATOM chip. Although I guess I could limit the number of connections, it just feels like there could have been a better way of optimizing the approach than just throwing in a constant simultaneous connection limit.

The chip in question is the D510 (its comes embedded on a D510MO mini ITX board for around ¬£55-60 with 2xSATA ports, perfect for small server. No VT though...) 2 physical cores, 4 logical cores thanks to hyperthreading, given it 2.5gb of ram on linux mint wih LXDE. I wanted to go arch but the setup attempt became tedious, and I broke the pacman and repo lists. Twice. :)

No matter what though, each thread will spend ~85% of its time waiting to send/recieve data before parsing it and then executing the required procedures/functions...

26-10-2011, 04:45 PM

However, attempting to connect to every port could result in a LOT (60K+) threads which I believe could cause trouble for the ATOM chip. Although I guess I could limit the number of connections, it just feels like there could have been a better way of optimizing the approach than just throwing in a constant simultaneous connection limit.
You can limit to a number of connections, like 100 or so; use critical section for a variable that indicates number of active connections (and update it when each thread finishes).

The chip in question is the D510 (its comes embedded on a D510MO mini ITX board for around ¬£55-60 with 2xSATA ports, perfect for small server.
This is actually a relatively powerful yet highly power-efficient chip. I don't think it'll have problems handling hundreds of threads. You can also use a low-latency RAM to tune up the performance.

No matter what though, each thread will spend ~85% of its time waiting to send/recieve data before parsing it and then executing the required procedures/functions...
This was exactly my point. Since the majority of threads will be waiting, you can use as many of them as possible and they still might not end up using 100% of CPU.

code_glitch

26-10-2011, 05:02 PM

I guess the only limit is how much RAM and CPU time the linux kernel spends managing a lot of threads... Thats was my main fear, but as you pointed out, this does not seem to be a major issue.

Aside form all that, does anyone know how to properly handle a disconnection event? All I get are crashes and runtime errors...

User137

26-10-2011, 11:43 PM

Not exactly sure what kind of connection you have, but i use Synapse for TCP and UDP.

Basically synapse server thread listens the socket for given time. It stops on WSA error, which my class may translate as disconnection and stops thread. Still some very rare case i get a runtime error when i exit application, which i'm still bit clueless to. Perhaps debugger just wants to close but since there is a thread that is floating separately of main application is causing some conflict. I mean it may not shut down immediately because of the read timeout period. I mean, main application calls ServerThread.Terminate on app close, but thread handles it only after current read period ends. It's possible to also wait for all threads to stop and have a sleep() loop.

code_glitch

27-10-2011, 12:48 PM

@User137

I hope to reuse a lot of the code for another project - and I want it free of any 3rd party dependencies that require me to package any extra .so files (dlls in winspeak) - the most logical solution I found when I started writing the code was to use the Sockets unit. I now have a system that works as follows:

The main thread listens on the protocol port.
Any client that connects to said port is given a port number, and a handler thread is booted for the thread.
The handler thread waits for connection and then performs a username and password exchange in an interation of a timesensitive & randomized 100byte XOR string exhange. If this fails, the cliented is given the boot, otherwise it is given access to the servers' script interpreter.
As soon as the handler thread begins, the main thread returns to listening on the protocol port for any new connections.

And as the handler threads are in a dynamic array, more can be added by simply using the SetLength procedure and a few other configuration procedures at runtime :) The default is for a maximum of 25 simultaneous connections.

THE problem I face now is probably the most annoying - if the client is closed, the server gives a runtime error and crashes. If the server is closed, the client produces a runtime error and crashes. Does anyone have any idea of how to resolve this issue with the Sockets unit? I have no idea what call is crashing it, if any, and my guess is that one would need to handle some kind of connection dropped event - but I cannot find anything based on the docs over at http://www.freepascal.org/docs-html/rtl/sockets/index-5.html

Otherwise all is well, a little bit more implementation in the interpreter and handshake and all should be well... For the curious ones among you, its all TCP traffic ;)

User137

27-10-2011, 01:04 PM

Quick note, Synapse does not use any external files such as dll's. So final application just needs its executable file to run.

code_glitch

27-10-2011, 01:43 PM

Ah, now that is interesting. Any links? Google turns up a lot of stuff, just no SF page or anything... What is the license on it, and resource usage?

Edit: Scratch that, 'ararat synapse' ;)

Are there any examples around? That looks like a nice lib with support for a lot of new(er) stuff, ssl, smtp... Might just be worth rewriting the connection specific stuff from the Socket unit to synapse after all :) JUst depends on license by the looks of it. But it does have alot of source files. IMHO I may just stay with sockets, its versatile, does what I want it to and has no extra Source files. Nice find though, and if I don't reuse the code form this server I might just end up using synapse instead.

Edit 2:

Does anyone know anything about Socket errors 111 and 141? My best guess is SIGPIPE - but I have no idea how to control it in pascal :(