PDA

View Full Version : Some gravity particles

User137
12-06-2013, 09:04 PM

Screenshot of earlier version: http://img32.imageshack.us/img32/2327/particles.png

Each particle is now drawn with size 5 GL_POINTS. I used total 3000 particles, and any more would make it lag more horribly than it currently is. It's not about rendering though, as seen in the screenshot. I initialized it with 200000 particles and it wasn't even breaking a sweat at the vsync'd 63 fps with 0ms render time.

So do you know how i can optimize this gravity calculation?

function Hypot3f_2(const x, y, z: single): single; inline;
begin
result:=sqrt(x*x+y*y+z*z);
end;

function VectorDist_2(const a, b: PVector): single; inline;
begin
result:=hypot3f_2(a^.x-b^.x, a^.y-b^.y, a^.z-b^.z);
end;

procedure TGame.GameLoop;
var i, j: integer; d, f: single; vec: TVector;
start: cardinal;
begin
start:=nx.GetTick;
... // Quick interpolation of camera, zoom etc...

if not paused then begin
// Update forces
for i:=0 to count-2 do
with star[i] do
for j:=i+1 to count-1 do begin
d:=vectordist_2(@v, @star[j].v);
// If particles exactly overlap, rather keep current velocity this frame
if d>0 then begin
f:=0.00001/(d*d);
d:=f/d;
if star[j].positive<>positive then d:=-d;
vec.x:=(star[j].v.x-v.x)*d;
vec.y:=(star[j].v.y-v.y)*d;
vec.z:=(star[j].v.z-v.z)*d;
movement.x:=movement.x+vec.x;
movement.y:=movement.y+vec.y;
movement.z:=movement.z+vec.z;
star[j].movement.x:=star[j].movement.x-vec.x;
star[j].movement.y:=star[j].movement.y-vec.y;
star[j].movement.z:=star[j].movement.z-vec.z;
end;
end;
// Move particles
for i:=0 to count-1 do
with star[i] do begin
v.x:=v.x+movement.x;
v.y:=v.y+movement.y;
v.z:=v.z+movement.z;
end;
end;
looptime:=nx.GetTick-start;
end;

SilverWarior
12-06-2013, 10:45 PM
While I'm not good at math I can't give you advices on improving the way you calculate your Physics.
But as a programer I would definitly advise you to start using multithreading for this.

User137
12-06-2013, 11:37 PM
I could precalculate the frames, store like every 20th, and then playback it with interpolation. That way threading wouldn't be a problem at least. Also my CPU only has 2 cores so i would only double the performance, but that's something i guess.

*Sees a vision in mind of splined curves of the whole particle animation and drools a little...*

SilverWarior
13-06-2013, 02:13 AM
That way threading wouldn't be a problem at least. Also my CPU only has 2 cores so i would only double the performance, but that's something i guess.

Why would threading be a problem? All you have to do is make sure that you don't actually move any of the particle until you have calculated all future position to every other particle.
For this you would actually need double-buffering of some of the particle information (position, movment heading). And if you have particles saved as objects (classes) you can use Getter and Setter methods of class properties for easily handling of double-buffering.

User137
13-06-2013, 02:33 AM
Problem is in that each thread will access to same particle data at the same time. I cannot split them to isolated index groups, because each particle is compared to every other. So many threads could apply forces to same particle at the same time. But it's just a simple sum operation, the calculation does not depend on value of the movement vector, so it might not matter. I do not know. I could do the "// Move particles" part in main thread after the forces are updated, cause i think that's the fastest part of it. I would also put main thread waiting for those to finish before continuing. It will go to same 1-4 fps while counting it anyway, it's hardly controllable at all.

#version 120
attribute vec3 in_position;
attribute vec4 in_color;
varying vec4 color;
uniform mat4 pmv;
void main() {
color = in_color;
gl_Position = pmv * vec4(in_position, 1.0);
}

#version 120
varying vec4 color;
void main() {
gl_FragColor = color;
}

edit: Using 4 physics threads now on the time-critical part, and CPU use went up to 90% or so. I don't at least see any differences in the visual outcome, sure it counts it faster.

Dan
13-06-2013, 07:38 AM
you should not approach this task with a brute force of the CPU, because you will surely get stuck.

I can make a couple of suggestions to improve the algorithm:

1 - use GPU to do your calculations.
2 - cluster your space (each cluster can contain hundreds of particles). calculate the total mass and the center of gravity of each cluster. attract individual particles with each other within a cluster (or you might create sub-clusters if you have too many particles). attract individual particles within a cluster to the center of mass of the other clusters. this will significantly decrease the time complexity of your calculations (at the moment you have O(N) = N * N).

User137
13-06-2013, 03:35 PM
Can i do clustering, if start situation is static mass that is mix of both gravity types? I think at beginning there would then be as many clusters as there are particles. I'm also not very convinced yet, because cluster's gravity center location is a living thing that depends on each of its children. I guess it would be more of a heuristic simulation.

Another funny sidenote about this simulator, is that currently the radius of the particle mass is 10. But if i reduce it down to 0.001 or so - all particles in very tiny spot next to eachother, they literally explode in a "Big Bang". (But i've seen that before, didn't come as a surprise...)

I think the clustering might work, but with different start situation. Still it would be very rough estimation. Say 50-100 clusters with random mass (mass being amount of assigned particles). Then each particle only orbiting the 1 cluster, and each cluster orbiting all the other clusters. Problem becomes of deciding when 1 particle switches to orbit another cluster, and i'm not sure i could solve it. Firstly the borderline between 2 clusters would be calculated with Newton's equations, spot between them where gravity force is equal. Wouldn't i be doing that calculation for each particle per each cluster all the time? Or each cluster would count and store its nearest neighbours every once in a while.

Anyway that's too much work for simple test.

Dan
13-06-2013, 04:27 PM
there might be one problem with using clusters, and that is the diminishing gravity over distance. I am not sure if it is possible to simulate this effect. I might give it a try myself.

SilverWarior
13-06-2013, 08:16 PM

Unles your threads are only reading the data and not changing it there should be no problem. Problem would only ocur if one of the threads would be changing some values of one particle while other tread would be reading it (data corruption).

I could do the "// Move particles" part in main thread after the forces are updated, cause i think that's the fastest part of it. I would also put main thread waiting for those to finish before continuing.

First of all never and I mean NEVER put the min thread on hold for other threads to finish. Why? Doing so you prevent your application to process Windows messages or System messages if you are using some other platform. And if you do so then your application would be considered as being hanged application by the OS and your OS might simply go and forcefully kill it. You definitly don't want this.
If you use double-buffering approach you actually dont even need main thread to do any moving.

Let me try to explain you how would I approach to this problem:
First I would save information for each particle in class object with double-buffering so I that I have acces to the variables using properties. Why using properties?
Becouse with getter and setter methods of properties you can easily implement double-buffering.
The class code for particle wouls look something like this:

type
TParticle = class(TObject)
private
PSecondBuffer: ^Boolean;
FPosition1: T3DPoint;
FPosition2: T3DPoint;
FForce1: T3DVector;
FForce2: T3DVector;
protected
function GetPosition: T3DPosition;
procedure SetPosition(AValue: T3DPoint);
function GetForce: T3DVector;
procedure SetForce (AValue: T3DVector);
public
constructor Create(AOwner: TObject; var SecondBuffer: Boolean); override;
property Position: T3DPoint read GetPosition write SetPosition;
property Force: T3DVector read GetForce write SetForce;
end;

//We use this global variable to determine whether we will read from primary or secondary buffer
var SecondBuffer: Boolean;

implementation

{ TParticle }
constructor TParticle.Create(AOwner: TObject; var SecondBuffer: Boolean);
begin
inherited;
//We assing pointer to external boalean variable for controling whether
//we read from primary or secondary buffer
FSecondBuffer := @SecondBuffer;
end;

function TParticle.GetForce: T3DVector;
begin
//If the second buffer is true then we read from second force variable
//else we read from first force variable
if FSecondBuffer = True then Result := FForce2 //Secondarry buffer
else Result := FForce1; //Primary buffer
end;

function TParticle.GetPosition: T3DPosition;
begin
//If the second buffer is true then we read from second postiion variable
//else we read from first position variable
if FSecondBuffer = True then Result := FPosition2 //Secondarry buffer
else Result := FPosition1; //Primary buffer
end;

procedure TParticle.SetForce(AValue: T3DVector);
begin
//With the difference of getter methods we now if second buffer is true
//write data to first force variable to avoid changing the data that might
//We never change the same variable as it is available for reading
if FSecondBuffer = True then FForce1 := AValue //Primary buffer
esle FForce2 := AValue; //Secondarry buffer
end;

procedure TParticle.SetPosition(AValue: T3DPoint);
begin
//We handle position variable changing same as above.
if FSecondBuffer = ture then FPosition1 := AValue //Primary buffer
else FPosition2 := AValue; //Secondarry buffer
end;

As you see each class has two internal variables for every property (value) that might be changed by other threads during curent cycle while still preserving acces to current data without it being corupted by other threads.
For contgrolling this we are actually using external boolean variable. This means that with a single call we can change, which internal variable would be used for reading and which for writing, for even a few thousands of class objects if we need it. This is probably the best solution for controlling class double-buffering.
NOTE1: If you need per class double-bufering control you need as many global variables as you have distict classes.
NOTE2: You could also save this information into the class itself but then you will have to update this information fora each class which could result in several thousands calls.

Darkhog
13-06-2013, 08:50 PM
Cool demo. Could you share sources and exe file so other people can play with it?

User137
13-06-2013, 09:51 PM
This is the only part where thread collisions can happen. It is possible that some force vectors are not added to movement, but is that likely?

movement.x:=movement.x+vec.x;
movement.y:=movement.y+vec.y;
movement.z:=movement.z+vec.z;
star[j].movement.x:=star[j].movement.x-vec.x;
star[j].movement.y:=star[j].movement.y-vec.y;
star[j].movement.z:=star[j].movement.z-vec.z;

Also the mainloop's part for threads is like:

i2:=count div 4;
i3:=i2*2;
i4:=i2*3;
sleep(1); // It takes usually over 140ms, so sleep here is not unnecessary
Application.ProcessMessages;
end;
...
constructor TPhysicsThread.Create(parent: TGame; const first, last: cardinal);
...
...
for i:=FFirst to FLast do
with star[i] do
for j:=i+1 to count-1 do begin
So it will not stop responding. I know it could be done better though, even so that fps wouldn't be affected. I also realize now that all the threads aren't getting equal workload at all. Especially the first thread gets biggest work, and 4th one the smallest.

Win32-binaries and source code here: https://docs.google.com/file/d/0B7FII3MhcAHJRGkwQzhCMTlncmM/edit?usp=sharing
Requires Lazarus and nxpascal to compile, possibly SVN version. All the required files are here:

SilverWarior
14-06-2013, 10:53 AM
Also the mainloop's part for threads is like:

i2:=count div 4;
i3:=i2*2;
i4:=i2*3;
sleep(1); // It takes usually over 140ms, so sleep here is not unnecessary
Application.ProcessMessages;
end;
...
constructor TPhysicsThread.Create(parent: TGame; const first, last: cardinal);
...
...
for i:=FFirst to FLast do
with star[i] do
for j:=i+1 to count-1 do begin
So it will not stop responding. I know it could be done better though, even so that fps wouldn't be affected. I also realize now that all the threads aren't getting equal workload at all. Especially the first thread gets biggest work, and 4th one the smallest.

First of all get that Application.ProcessMessages out of that loop. It is not good to call Application.Process messages to often. And the reason why your main thread has so much load is becouse of Application.ProcessMessavges. It is quite resource consuming call. And all that processing time that is used by Application.ProcessMessages is taken away from one of your other threads making everything slower. And since you sad that you only have dual core the impact on overal speed can be quite big.
Wanna see how much Application.ProcessMessages affect your application. Start the simulation with setting sthat will cause low FPS, rotate the view and then pause the simulation. You will se that after you pause the simulation the overal view will quickly rotate to proper position while when being at low FPS it can be falling beind a litle.

Darkhog
14-06-2013, 12:17 PM
I disagree that taking away ProcessMessages is solution. If you do that, application will "hang" (won't respond to paint request, button clicks, etc.). Though I agree it shouldn't be called every iteration. Maybe every tenth iteration or so.

SilverWarior
14-06-2013, 12:37 PM
No the application still works OK but the performance gain is much smaler than I imagined it would be.
Yes I'm actually testing the source code to see where there may be major bottlenecks.

SilverWarior
14-06-2013, 01:03 PM
@User137
Did I understand corectly that you intend to calculate force efect on every particle from every other particle? If that is ture than your current code is compleetly wrong as you are only calculating the force effects to current particle of all those who are positioned in the list after the current one.
This would cause compleetly wrong simulation.
But on the other hand it would mean that proper code would work even slower as your would need to do even nore calculations ???

User137
14-06-2013, 03:56 PM
First loop i runs from 0 to count-2, and is divided for different threads. Second loop j runs from i+1 to count-1. Overall effect will be that each unique pair is calculated once, and once only. Index 0 vs 1 is counted, but index 1 vs 0 is not, and so on for all of them. They have same potential force towards eachother, because i assume they all have same mass, and it is applied to both of them at the same time. ... Well, i could try do that 1 fix where physics is calculated separately from main thread, to get 60 fps all the time possibly.

You may have also noticed that i modified form.onClose event: if threads are running when click it, it will call for threads to stop and then you have to click it again. Otherwise crash would happen on exit, due to accessing game data after threads finish, but data being freed already. It didn't help if i put a waiting loop in the onClose. That was 1 "fix the wheel with bubblegum"... I'll see if i could fix it aswell.

User137
14-06-2013, 04:57 PM
Updated the file twice, second time after a little code cleaning.

imcold
14-06-2013, 05:15 PM
Nice demo! Few notes:
-too many ops per one star, some reduction helps, for example

d := G_FORCE/(d*d*d); - division is one of the slowest ops you can do (just this one line change makes it 20% faster)
-no use of more than 2 calculation threads on dual-core, more threads than cores makes it slower actually
-Application.Processmessages is an invitation for reentrancy problems, and at least use interlocked decrement on the thread counter (thread joining would be better, sleep really isn't a synchronization primitive)

User137
14-06-2013, 05:45 PM
Nice notice (fixed it into my version)

d := G_FORCE/(d*d*d);
It does mean same thing as:

f:=G_FORCE/(d*d);
d:=f/d;

Also there is no Application.ProcessMessages in that version (you can remove "forms" from gameunit uses list). It was replaced by

in the physics main thread. It still doesn't hang the application because it's a separate thread. Spiking in framerate is simply because of heavy utilization of CPU i think. Trying with this seems to work too, but i feel like it is actually consuming CPU resources more with loop like that

while Threads>1 do ; // <- Don't do this
So with this i see an increase in physics loop time.

It should be ok to have 4 threads even on dualcore, remember that each thread gets different workload. Meaning that other CPU could be without work for longer time, if i only had 2 threads. At the same time people with quadcores can test it too. It might even be faster with 8 threads.

Also works better if i comment out SetFrameSkipping(false); Showing more realistic framerate (~40) numbers too, when it's not pushed to draw as much as possible.

imcold
14-06-2013, 06:19 PM
Empty while loop is worse, for sure: use TThread.WaitFor. You probably want something like this in your TPhysicsMainThread:

...

end;

User137
14-06-2013, 07:21 PM
Wow, this became quite a practise for threads :D You are right again. But also with this change i could test how 32 threads would effect, and as expected, it seemed just as fast as 4 or 8 threads. And minor difference in the creation, to support odd particle counts aswell

i*tcount, (i+1)*tcount-1);
New version is also uploaded, and there is extra test in rendering loop commented out. I tried what it looks like when full screen clear would be replaced with slow fading to black. Particles can leave kind of trails when they move. I'm not sure if it looks better than original though, so that's why it's commented.

imcold
14-06-2013, 07:29 PM
Also (a bit akin what Dan suggested) you can play with the condition affecting when to apply the force to particle:

if (d > 0) and (d < DISTANCE_TO_IGNORE)
With distance set to 15 or so gives a nice speedup in later stages, when the particles spread out.
Smaller distances (try 5) will give you more speedup in early stages but have a more noticeable effect on particle behavior (less tight clusters).
Larger distances (100) cause slowdown in early stages
Dynamically tuning the "ignore distance" would probably lead to best results (2xfaster execution shouldn't be a problem).

As per more calculation threads than cores: if threads with similar workload start competing for resources, you'll get slowdowns from overheads associated with threads, scheduling, context switching, cache trashing and so on.

Edit: you have an error within the last statement, (MAX_THREADS-2) should be (MAX_THREADS-1), otherwise the last thread will get more stars than it should (the same ones as in the prev. thread, not to mention it won't work with one thread ;)

User137
14-06-2013, 08:52 PM
Edit: you have an error within the last statement, (MAX_THREADS-2) should be (MAX_THREADS-1), otherwise the last thread will get more stars than it should (the same ones as in the prev. thread, not to mention it won't work with one thread ;)
Just noticed that myself. I was testing with randseed:=1 just to see if i get same results with 1 thread or many. It crashed the moment i tried to set it on 1 thread ::) Also 1 thread didn't result same kind of universe than 2 or more threads.

But that's fixed now. I added more debug info, like how many physics ticks are calculated, print out how many threads are used. Changed the visuals slightly, so that blue particles aren't add-blended, and there is the slow fade-to-black only when it is unpaused.

imcold
14-06-2013, 10:00 PM
The visuals are really cool :) Final note for the threads - the load isn't distributed evenly between them, given the same amount of stars, the one with lower starting index does more cycle iterations. That's most likely why you see a speedup with more threads - the bigger the discrepancy, the more difference thread count does. Also, physicsthread.WaitFor on formClose; no need for thread counter at all.

User137
20-06-2013, 03:02 PM
I realized today that while 3D is great, is not necessary to simulate the gravity particle effect, or should you call it "anti-gravity". This can be done in 2D, with big gain in performance, and better results to look at.

Now after testing it a little, it didn't seem to work somehow, or maybe i'm missing something. I also think that i'm able to rid of Sqrt() completely.

for i:=FFirst to FLast do
with star[i] do
for j:=i+1 to count-1 do begin
xx:=v.x-star[j].v.x;
yy:=v.y-star[j].v.y;
d:=xx*xx+yy*yy;
if d>0 then begin
d:=(G_FORCE*G_FORCE)/(d*d*d); // Compiler should optimize the constant multiplication to 1 value
// If not, just remove multiplication and change G_FORCE value
if star[j].positive<>positive then d:=-d;
vec.x:=(star[j].v.x-v.x)*d;
vec.y:=(star[j].v.y-v.y)*d;
movement.x:=movement.x+vec.x;
movement.y:=movement.y+vec.y;
star[j].movement.x:=star[j].movement.x-vec.x;
star[j].movement.y:=star[j].movement.y-vec.y;
end;
end;
The particles scatter endlessly, with no clear groupings done as was in 3D. I have tried with many different parameters. I wouldn't want to make boundaries to the "universe", where particles would bounce backwards, but i might have to. I also tried with every particle being same polarity, and they still did not group. It's not because of removal of sqrt(), that was almost last step in my tests, improvement to calculation time i'd say.

SilverWarior
20-06-2013, 04:06 PM
Based on the code it seems you are only applying force to [j] particle and not to [i] particle at the same time.
Haven't you been calculating force between two particle once and then applying that force to both of your particles at once?

imcold
20-06-2013, 04:26 PM
The modified "d" calculation doesn't produce the same values as the previous equation if you take out the sqrt, so the effect has to be different.

dist = sqrt(v)
d = G_FORCE / (dist*dist*dist) = G_FORCE / (sqrt(v) * sqrt(v) * sqrt(v))

power:

d^2 = G_FORCE*G_FORCE / (v * v * v)
d = sqrt(d^2)

User137
20-06-2013, 05:04 PM
Based on the code it seems you are only applying force to [j] particle and not to [i] particle at the same time.
Haven't you been calculating force between two particle once and then applying that force to both of your particles at once?
If i set NUM_TOTAL = 2; and 2 opposite particles, they start pushing eachother away in opposite direction, as should. Same code as before, i just commented Z-related lines. "with star[i] do" on second quoted line refers to "i" particle.

I didn't do the math before, i was thinking it might still be relatively same effect.

sqrt(4)*sqrt(4)*sqrt(4)
= 2*2*2 = 8

= sqrt(4*4*4)
= sqrt(64) = 8

X/Sqrt(Y)
= (X*X)/Y , if Y>=0

imcold
20-06-2013, 05:21 PM
X/Sqrt(Y)
= (X*X)/Y , if Y>=0

Um, no. Trivially proven by X=1, Y=2.

User137
20-06-2013, 05:43 PM
Oh crap... this is how it actually goes:
X / Sqrt(Y)
= (X*Sqrt(Y)) / Y
which puts us back to square 1... on why doesn't it work, or works like that even when using sqrt? I have tried decreasing, but also increasing the G_FORCE to compensate lack of particle density in depth direction. But at the same time there is a more uniform force pulling every direction, from the overall system.

imcold
21-06-2013, 05:52 AM
If you scratch one dimension, you have to reduce the divisor - scratch one multiplication. Then it could be reduced further, this should work nicely:
d = G_FORCE / d

User137
21-06-2013, 07:09 AM
Remember there were originally 2 multiplications in optimized version, so now it becomes 1 and that looks like working well

d:=G_FORCE/(d*d);
I still want to give boundaries to the world though, and see what it'd be like. Kind of like how subsection would settle, and i expect a little plasma-like effect. Some days later perhaps.

edit: Decided to code them straight away. (No plasma, but something more boring 8) )

// Rectangular collision
{if v.x<-10 then movement.x:=abs(movement.x)
else if v.x>10 then movement.x:=-abs(movement.x);
if v.y<-10 then movement.y:=abs(movement.y)
else if v.y>10 then movement.y:=-abs(movement.y); }
// Circular collision
if v.x*v.x+v.y*v.y>100 then
movement:=reflect(movement, vector2f(-v.x*0.1,-v.y*0.1));

SilverWarior
21-06-2013, 11:28 AM
Here is interesting proposal for simulation scenario:
After creating particles in the first place give them initial velocity as if tey would be orbiting some point in the center of your simulation world.
I think it could make some more interesting results since particles won't simply split into positive and negative group so fast. And if you give particles which are closer to the center a bit more speed you might even get some swirl effects from particle movments.

User137
17-07-2013, 06:10 PM
Went and bought Universe sandbox on Steam sales, now that it's under 4 euros.
1179
It is getting a bit laggy with 50000 asteroids orbiting the star in the screenshot. But these asteroids are most likely not orbiting eachother. Either way, lots of things this app can do, with creative mind.

edit: Please add the maximum limits for attached images. You can't really see anything from this small blurred image.

SilverWarior
17-07-2013, 07:48 PM
But these asteroids are most likely not orbiting eachother. Either way, lots of things this app can do, with creative mind.

I tested Universe Sandbox a wile ago while it was still free alpha and it is quite good.
As for the asteroids I don't think Universe Sandbox calculates atraction between them.
Even in reality asteroids usually have so low gravity that they can't affect each other until they get realy close to each other.