PDA

View Full Version : Nextgen-software rendering



herrcoolness
28-06-2010, 01:52 PM
Hey every one...
For first.. as i wrote on other forums.. my english is bad but i think understandable.. ;)

Ok now to the point.
I am writing in FPC a software renderer. My plan is to create 3d engine with integrated software rendering system. Rendering will go thru 3 deffered rendering stage's:
1. rendering depth and triangle id... good for first occlusion of pixels, objects, and for reducting texture access or pixel shading calculations,antialiasing
2. gathering information from rendered triangle pixels - reading material textures and
depth buffer and writing material attributes, and 3d pixel positions to a G buffer,
3. lightning and shadowing


For now i have:
- triangle rasterizer which draws the triangle on per tile basis (tile drawing increases cache hits).. i think the triangle rasterizer innerloop is nice optimized, but not the tile drawing...because i am in prototyping stage ;D..
(The idea is based on Nicolas Capens "Advanced Rasterization" article, but my algorytm is much faster, for example :
***i don't use rectangle traversal algorytm but algorytm more suited for tile scanlines of the triangle ,
***fast trivial accept and reject of tile vs triangle routines, where the idea is based on intel's document about larabee rasterization)
-hierarchical structure of tiles, for z-buffer occlusion (zmin-zmax-per tile),
later i use similar structure for g-buffer (but in 3d) tiles for faster rejecting and accepting of tile and light-volume-primitve in lighning stages ( cube-sphere, cube-cone checking)
-texture reading procedure prototypes
-sse4 instructions used, but fpc don't support them so i need convert some instructions to x86 byte code .. thnx to Nasm


I am playing with idea using NASM as ASM-JIT compiler because when you write a required raw instruction with registers (or memory pointer) without any header or additional info in to one file, the NASM compiler convert it to x86 byte-code and write it to output file. So i write in pascal a string with assembler program, write it to a file, execute nasm with this input file, i read the output file to memory, setup required call and variable pointers and viola... runtime-compiled procedure... 8)

The page of my project http://sourceforge.net/projects/phenomenon/

JSoftware
28-06-2010, 02:48 PM
Just add the instructions to fpc :)

Take a peek at x86ins.dat in the compiler/x86 dir. To use them in the internal assembler, you also need to add the code for it, but there are probably lots of examples you could look at there too

As for the engine, I get an access violation when I try to run the test

herrcoolness
28-06-2010, 04:44 PM
The demo is in 1680x1050 resolution and in window mode using direct-draw interface.
What system do you have?
Your cpu supports sse4?
In the demo is just rotating quad with bilinear filtered texture :).. just small test :)
and about the tip... i check it
.. i think.. next time, when i upload the source with demo... i reduce the resolution :)

arthurprs
28-06-2010, 09:59 PM
i have got an access violation, screen:

http://i49.tinypic.com/11m432e.jpg


My cpu is an Athlon II M300, with SSE4A instruction set

Galfar
29-06-2010, 02:47 PM
Test demo runs at ~24fps on my 2.5Ghz Core2 Duo.

What about using for SSE2 instead of SSE4, at least for the demo (so people with older and/or AMD cpus can test it too)?

czar
29-06-2010, 07:35 PM
didn't work for me either.. Double click test, screen flickers for a second I can see a box and then it stops. I am using a quad core intel with windows XP at 1680x1050

herrcoolness
30-06-2010, 08:01 AM
i have got an access violation, screen:

http://i49.tinypic.com/11m432e.jpg


My cpu is an Athlon II M300, with SSE4A instruction set


Thnx arthurprs ;),the back-trace in the image helped me a lot to undersdand, where is the problem. There is just one place where i used sse4... and its in function "texturesampler_bilinear" in file "texture_calc_addr.inc".I readed, there is small incompatibility in sse4 instruction set between AMD and intel. I use intel quad-core. So for now the demo test will run just on intel processors, but i will change it.



Test demo runs at ~24fps on my 2.5Ghz Core2 Duo.

What about using for SSE2 instead of SSE4, at least for the demo (so people with older and/or AMD cpus can test it too)?


Gaflar, when the renderer will be complete, there will by more better processors,with sse5 or the larabee procesors. Its a long path which takes long time. Backwards compatibility can slow down the rendering because when i use 5 old instructions and not 1 new for doing the same thing.. its more work for the cpu. You know it. ;) But i drop to sse3 :( because of the incompatibility in sse4 (4.1,4.2,4a) instruction set between intel and AMD or i will use only such like instructions, that are in both processors.




didn't work for me either.. Double click test, screen flickers for a second I can see a box and then it stops. I am using a quad core intel with windows XP at 1680x1050


Czar, there are 2 windows. one for graphical output and one (the console) for the text output. See in the console if there is an exception raised like by arthurprs. But the problem is you have intel-quad like i have, your card support 1680x1050, i have too XP.. ??? If you can, upload pls the output of the console window like arthurprs.

Guys i learned little big more.. Thnx for the feedback.. ;)

herrcoolness
30-06-2010, 10:29 AM
.. and Galfar... nice project with your image library.
I am planing to use your image library instead of DevIL with your permission. :)

JSoftware
30-06-2010, 10:32 AM
Are you sure esi+tEXTURE_Header.addrcalc is 16-byte aligned? That could probably be the source of the problem

Mind you, I get the precisely same backtrace as arthurps on a 32bit core 2 duo

herrcoolness
30-06-2010, 10:47 AM
Are you sure esi+tEXTURE_Header.addrcalc is 16-byte aligned? That could probably be the source of the problem

Mind you, I get the precisely same backtrace as arthurps on a 32bit core 2 duo


JSoftware i am using special sse_getmem,sse_freemem procedures which align pointer to the 16-byte boundary.

The texture header consists of "clampmin,clampmax,size:fvec4;wrapmask,addrcalc:ive c4;". All this variables are 16-byte wide and the compiler is compiling all global variables in to 16-byte boundary's. See file "compilecfg.inc", there is compiler directive "{$CODEALIGN varmin=16}", which says to the compiler, that all global variables must be aligned to the 16-byte boundary. So qword, dword or byte variable is 16-byte wide too. So all variables can be easyly loaded with movaps and movdqa without an error or without to be mixed with other small (8,4,2,1 byte-wide) variables :)

.... aaand are sure, that your cpu support sse4? Because some intel-duo core doesn't support it.

BeRo
01-07-2010, 01:02 AM
Hm, i've done already something in this form few years ago:

http://vserver.rosseaux.net/stuff/BeRoSoftRender.zip

This is my old software renderer with many antialiasing variants, SSAA, QUINCUNX and so on, with multicore CPU support, with Early-Z spanwise test, with "already ready and fully functional" Realtime Runtime JIT x86-32 Assembler (like SoftWire, and with my own CubeMan 4k Intro as example source included), and so on. It's dual-licensed (my own license and AGPLv3 as second fallback license). If you do want, you can use it as reference inspiration source.

herrcoolness
01-07-2010, 01:53 PM
Hm, i've done already something in this form few years ago:

http://vserver.rosseaux.net/stuff/BeRoSoftRender.zip

This is my old software renderer with many antialiasing variants, SSAA, QUINCUNX and so on, with multicore CPU support, with Early-Z spanwise test, with "already ready and fully functional" Realtime Runtime JIT x86-32 Assembler (like SoftWire, and with my own CubeMan 4k Intro as example source included), and so on. It's dual-licensed (my own license and AGPLv3 as second fallback license). If you do want, you can use it as reference inspiration source.



Bero wrote here? :o The famous Bero, demo creater and author of such things like DLLtools (from memory dll loader), BeroTracker, BeroXM, PAPPE :o.. Wait... need take a breath.... 30 min later... ok.. i take it.. Man your work inspired me sooner. ;) I am planing to wrote physic engine (like your pappe) and sound library with ogg, mp3 and module formats (it,s3m,xm ..) using openal. Man you do a great work for the pascal community and the programmers which are programing in other languages can see, what can be programmed with pascal&assembler.

Bero i have some questions for you:
1. Will be there any BeroTracker source release or just the playing routines source release?
2. The phone version of BeroXM, is it better version of the old public source on your page 0ok.de?
3. Will be there any new version of pappe, documentation, sdk?

Why this questions? There aren't many module playing libraries or 3d physic engines writen in pascal. Release of the source code or improvements can damn help to the community. We need to show to the c programmers :butt: :D that the pascal (delphi) has the same power to do the same things like c. :yes: I think pascal is much more for the human brain, because it can be easy readable & understandable. C is more symbol-like. But both have the same power. 8)

And bero, my antivirus is going crazy :D from your bero-packer :D.... I need to protect all your downloaded programs&source code archives from antivirus quarantine :D

Ok bero, i will study your source. Don't worry, i am not such one of the programmers who copy&paster the code. Better study&understand the thechnique. This more the inteligent way. And thnx god.. no .. better .. thx to you bero and all other open-source programmers or testers, for sharing your idea's and source's ;)

herrcoolness
01-07-2010, 02:34 PM
Bero... saw the source... the rasterizer is based on chris hecker release :).. nice extended with scanline early z reject.. :) But the JIT-asm.. damn.. we can say ... this is SoftWire in delphi version. :) Bero, one word ... !!IMPRESSIVE!!..like all your work :)

I am planning in the renderer use similar shader programing style like in unreal shader editor because of the renederer's deffered architecture. Some shaders will be divided in to smaller shaders like light shader in the lighting stage, or texture shader in texture gathering stage, or opacity shader in early z- culling stage,
transparency, refraction shader, in the post stage and fullscreen efect' in the image shader. Compiling the shaders to the x86 byte code i need to have just sse and mmx instructions and some i386. Or just using code macro's (code chunks), but there will raise register optimization problem. So the rough plan. :)

Bero if you have some little bit time, you can write some articles with pascal chunk codes or so. :) I wanna to write article too about the rasterizer, but my english is bad :(

JSoftware
01-07-2010, 04:45 PM
Woah, BeRo, you never sleep? :o

Impressive renderer

herrcoolness
01-07-2010, 04:55 PM
Woah, BeRo, you never sleep? :o

Impressive renderer


I think .. he can program and sleep in same time ;)..

herrcoolness
03-07-2010, 09:09 AM
Ok guys. i removed in the procedure "texturesampler_bilinear " sse4 instructions because they, as we know, caused small problems on AMD proccesors and replaced with faster lookup table which calculates the adresses in tile for bilinear sample fetching. Fps jumped from 26 to 32 fps.. with point sampling ist it about 36-37 fps.. so its nice speedup. :)
I compared "tile bilinear sampler" against "linear (standart in memory image representation on PC) bilinear sampler" and the speed stayed almost the same... linear representation of the texture was a bit slower,because of not cache-friendly representation of the texture. Tiled texture is good for big textures, because if the texture is in high resolution , the speed don't drop so fast down as in linear (standart) representation of the texture. Of course the linear calculation of the sample adress from texture coordinates is much simpler, but the cache-polution is much bigger and is causing much bigger slowdowns. ;)

https://sourceforge.net/projects/phenomenon/

czar
04-07-2010, 07:31 PM
the new one works for. 28 FPS.

Note your file says 3-6-2010 - I believe you meant o 3-7-2010 as in July and not June. It threw me for a while until I noticed that you had put the file up 2 days ago and so figured you had made a mistake.

herrcoolness
05-07-2010, 03:02 PM
the new one works for. 28 FPS.

Note your file says 3-6-2010 - I believe you meant o 3-7-2010 as in July and not June. It threw me for a while until I noticed that you had put the file up 2 days ago and so figured you had made a mistake.

Yes-yes. I corrected it. Thnx for the tip czar :). It was deep in the night when i uploaded the file :). Good to hear, that the demo works now :).

arthurprs
05-07-2010, 09:14 PM
not sure why, but doen't work for me, i can see the console window printing the fps (25 in my notebook with a hd4200 gpu), but not the renderer window :no:

herrcoolness
06-07-2010, 08:20 PM
not sure why, but doen't work for me, i can see the console window printing the fps (25 in my notebook with a hd4200 gpu), but not the renderer window :no:


Sometimes when the rendering window lose the window focus (you click on hide the window, or click to another direct3d application), the program can't restore the window. I don't handle this in my prog, but i will do ;). Using OpenGL or GDI this can't happen. I am planning to programm this 2 output's too. I firstly used the gdi output, but i was thinking it was slow, but it wasn't. It was almost so fast as DDraw. Just little bit slower in flipping of pages but GDI can't handle vertical sync. OpenGL can do both. No problem with restoring of HW surfaces as GDI and in OpenGL is a procedure for Vsync. And Opengl is good for crossplatform programming. I think using Ddraw is prehistoric. In modern day can raise some incompatiblity on modern cards. But i am not sure. I think ATI drivers can have with this a problem. I don't know. But i am working on it. As you can see. I have a plan for every problem. But i must know what is the problem. So .. thnx for the tip :).

herrcoolness
22-07-2010, 11:53 PM
New update. I added per-pixel mip-maping. To see how it workds i created 2 demos. One where we can see the mip-map levels, the second with normal drawing. There is a noisy pattern at the mip-map level boundary's. The reason is.. i use the "RCPPS" SSE instruction which is not so precise, as when i use "DIVPS" . Using "DIVPS" i get sharp edges on the mip-map boundary's, but this instruction is more slower then the "RCPPS". But when the mip-map levels are not colored the noisy pattern is not visible. See the no-mip-map-colored demo. ;) waiting for your feedback guys ;)

https://sourceforge.net/projects/phenomenon/

de_jean_7777
23-07-2010, 09:41 AM
This is what I get when I run the MipMap test:


An unhandled exception occurred at $00414078 :
EAccessViolation : Access violation
$00414078 TEXTURESAMPLER_BILINEAR, line 207 of gs_textures.inc
$004019E8 DRAWQUAD, line 107 of test.pas
$00401BCB TIMERPRC1, line 268 of test.pas
$004116E2 FE_DOMAINLOOP, line 124 of fenomenon_appwindow.pas
$00401D91 main, line 329 of test.pas

And this when I run the MipMap Colored test:


An unhandled exception occurred at $004141A8 :
EAccessViolation : Access violation
$004141A8 TEXTURESAMPLER_BILINEAR, line 207 of gs_textures.inc
$004019E8 DRAWQUAD, line 107 of test.pas
$00401BCB TIMERPRC1, line 268 of test.pas
$004116E2 FE_DOMAINLOOP, line 124 of fenomenon_appwindow.pas
$00401D91 main, line 329 of test.pas

The rendering window shows for a sec, then disappears even though it's still displayed as present in taskbar, the whole thing holds for a few seconds and then crashes.

I'm using a Pentium 4 1.9 GHz with SSE2, 512 MiB DDR400 and a integrated Intel 82845G/GL/GE/PE/GV graphics card.

herrcoolness
23-07-2010, 05:36 PM
This is what I get when I run the MipMap test:


An unhandled exception occurred at $00414078 :
EAccessViolation : Access violation
$00414078 TEXTURESAMPLER_BILINEAR, line 207 of gs_textures.inc
$004019E8 DRAWQUAD, line 107 of test.pas
$00401BCB TIMERPRC1, line 268 of test.pas
$004116E2 FE_DOMAINLOOP, line 124 of fenomenon_appwindow.pas
$00401D91 main, line 329 of test.pas

And this when I run the MipMap Colored test:


An unhandled exception occurred at $004141A8 :
EAccessViolation : Access violation
$004141A8 TEXTURESAMPLER_BILINEAR, line 207 of gs_textures.inc
$004019E8 DRAWQUAD, line 107 of test.pas
$00401BCB TIMERPRC1, line 268 of test.pas
$004116E2 FE_DOMAINLOOP, line 124 of fenomenon_appwindow.pas
$00401D91 main, line 329 of test.pas

The rendering window shows for a sec, then disappears even though it's still displayed as present in taskbar, the whole thing holds for a few seconds and then crashes.

I'm using a Pentium 4 1.9 GHz with SSE2, 512 MiB DDR400 and a integrated Intel 82845G/GL/GE/PE/GV graphics card.


Its easy to solve the problem. I used some sse3 instructions too. And you have just a sse2 processor. But good news for you. I think i stay in sse2 frame of sse instruction set. The reason is sse3 and sse4 instructions are way slower. I don't why, but it is so. And sse4 are problematic for the compatibility between AMD and INTEL processors. I do now some speed test. And if sse2 is faster then sse3 i upload same demo's but with just sse2 instruction set used :)

de_jean_7777
23-07-2010, 09:16 PM
I tried it on a processor that supports SSE3, a Intel Pentium E2200 (Dual Core). It als has a integrated Intel Card, the 82945G. This time it worked, but nothing showed. The framerate was showing at the console, and there seemed to be a window present in the tab-switch dialog(ALT+TAB), but I could not switch to it or show it. It's the same thing as with arthurprs, but I don't think the window loses focus. It simply disappears moments after being created.

I'll also try this when I get home on some more powerful hardware.

herrcoolness
24-07-2010, 10:26 AM
I tried it on a processor that supports SSE3, a Intel Pentium E2200 (Dual Core). It als has a integrated Intel Card, the 82945G. This time it worked, but nothing showed. The framerate was showing at the console, and there seemed to be a window present in the tab-switch dialog(ALT+TAB), but I could not switch to it or show it. It's the same thing as with arthurprs, but I don't think the window loses focus. It simply disappears moments after being created.

I'll also try this when I get home on some more powerful hardware.


But how can the window disappears. For some peoples it worked well. For others is the window a problem. Is in my window-creation procedure a bug? But everytime, when i change the attribute of the window (size, position) i refresh it. Damn.. i don't know... PLS HELP !!! :(

herrcoolness
24-07-2010, 11:15 AM
I changed the procedure SetWindowPos to MoveWindow in fenomenon_appwindow unit. We will see if it helps. For me it works so as before the change.

herrcoolness
24-07-2010, 12:51 PM
Ok guys. I fully removed some HADDPS (sse3)and HSUBPS (sse3) from the sampler rutine and added some "standart" ADDPS, SUBPS and PSHUFD and the result is like... WTF !!! :o The demo runs with the sse3 instructions on 32.89 FPS and now with the sse instructions 33.60. In the normal logic should be the sse3 instructions faster but :o ... they are slower. So yes... its true.. Better use some sse2 and lower version of sse instructions as the instructions from the higher versions. But still its good to use some benchmarks. I am surprised of the result. But i was reading similar messages about the speed of new sse instructions in other forums, so the surrprise was not so big. 8) But i am not sure about the speed on new processor. As i said... better use a benchmark. :yes:

de_jean_7777
24-07-2010, 03:52 PM
Ok. I've tried it on my personal computer at home, which is an Athlon II X2 240 (2.8 GHz, supports up to SSE4) with a Radeon HD4850. I downloaded your tests again, and tried to run them. Same thing happens, they run but nothing is shown. Since I use Windows 7 I noticed that the rendering shows in the preview when I hold my mouse over your program in the taskbar. The window is there, and can be focused. However, it is not shown on screen for some reason. Also, trying to minimize everything (WIN+M) makes the program crash. I've tried disabling Aero and closing every program I have, but with no success.

arthurprs
24-07-2010, 07:27 PM
Ok. I've tried it on my personal computer at home, which is an Athlon II X2 240 (2.8 GHz, supports up to SSE4) with a Radeon HD4850. I downloaded your tests again, and tried to run them. Same thing happens, they run but nothing is shown. Since I use Windows 7 I noticed that the rendering shows in the preview when I hold my mouse over your program in the taskbar. The window is there, and can be focused. However, it is not shown on screen for some reason. Also, trying to minimize everything (WIN+M) makes the program crash. I've tried disabling Aero and closing every program I have, but with no success.


same here



not sure why, but doen't work for me, i can see the console window printing the fps (25 in my notebook with a hd4200 gpu), but not the renderer window :no:

herrcoolness
25-07-2010, 05:24 AM
arthurprs are you using Windows 7 too like de_jean_7777. If yes, then i think win 7 is incompatible with directdraw. Guys can you test some directdraw applications on win 7 ? If they works. then the problem is in my program. If they don't work, then the problem is, as i said, the incompatibilitu of win7 to direct draw (then will i begin to use GDI, or OPENGL as draw-surface-flippers) . i think... for win7 is there new api for 2d draw.. i think direct2d or so. I am not sure. Check if win7 have got some compatibility options like WinXP for win95, win98... Run my program in some compatibility mode. Pls guys make this test's and give me the results. Thnx in forward. :) Oh... i almost forgot.. I use WinXP.

Closing the draw-window will result lost of the direct draw surface, so my program can't write to this surface. This is the reason why the program crash. And the long start is caused with the bad optimized mip-map creator ::)

de_jean_7777
25-07-2010, 06:08 AM
I also tested on Windows XP (should have mentioned this). All the PCs I tested were running XP, except my home computer where I tested both XP and Win 7.

Note that under Win 7 I could see the window rendering in the preview on the taskbar, but the actual window does not show. Due to this I conclude it renders ok, and DirectDraw works, but for some strange reason it the actual window is not visible. Also, the fact that the same problem is exhibited on XP means it's probably not Win 7 related.

AthenaOfDelphi
25-07-2010, 08:35 AM
I've just tested the two programs included in the RAR on my Toshiba Laptop. It's a Satellite Pro L500 Pro with a 2.1GHz Intel Core2 Duo T6570. It has on board graphics and 3GB of RAM.

It was reporting somewhere in the region of 31FPS. The rendering windows weren't visible until I maximised them. If I tried to move them, the cursor shot off to the bottom right hand corner of my screen as though they were visible but had been positioned by the software to be off the desktop area.

When they were visible, they didn't look right for some reason. Certainly the colour version appeared to be having it's texture corrupted by misaligned RGB values (if that makes sense). The colour version of the program also appeared to terminate when I restored the window to it's original size using the Restore option from the Windows system menu.

Wizard
25-07-2010, 09:19 AM
I also just tested the two exe's on my Sony Vaio Laptop (2.00 gigahertz Intel Core 2 Duo, 2 gig Ram, NVIDIA GeForce Go 7400 ). Vista Ultimate.

Upon execution the app ran at around 37FPS and the rendering window was minimized. When I clicked on the rendering window on the task bar windows immediately reported that the app has stopped working. However, when I right clicked on the rendering window on the taskbar and chose maximize it showed the window in full screen with the app running fine. The color does seem a bit off with mip-map-colored-sse2.exe...

Hope this helps ???

de_jean_7777
25-07-2010, 09:31 AM
I've just tested the two programs included in the RAR on my Toshiba Laptop. It's a Satellite Pro L500 Pro with a 2.1GHz Intel Core2 Duo T6570. It has on board graphics and 3GB of RAM.

It was reporting somewhere in the region of 31FPS. The rendering windows weren't visible until I maximised them. If I tried to move them, the cursor shot off to the bottom right hand corner of my screen as though they were visible but had been positioned by the software to be off the desktop area.

When they were visible, they didn't look right for some reason. Certainly the colour version appeared to be having it's texture corrupted by misaligned RGB values (if that makes sense). The colour version of the program also appeared to terminate when I restored the window to it's original size using the Restore option from the Windows system menu.





I also just tested the two exe's on my Sony Vaio Laptop (2.00 gigahertz Intel Core 2 Duo, 2 gig Ram, NVIDIA GeForce Go 7400 ). Vista Ultimate.

Upon execution the app ran at around 37FPS and the rendering window was minimized. When I clicked on the rendering window on the task bar windows immediately reported that the app has stopped working. However, when I right clicked on the rendering window on the taskbar and chose maximize it showed the window in full screen with the app running fine. The color does seem a bit off with mip-map-colored-sse2.exe...

Hope this helps ???


I reran your tests and encountered the same problems as Athena and Wizard. Your window positioning code is likely the culprit for this problem.

herrcoolness
25-07-2010, 03:24 PM
Guys big thnx for big help. The texture isn't gray but normal rgb texture. The "colored" means i xored every level of mip-map with a random color to see the different mip-map levels, because in non-colored (normal) mode you can't see where the mip-map level starts and where it ends. ;)

Ok now to the window problem... hm..
The program starts so:
1. creating window class
2. creating window 320x200 at 0,0 position
3. if it is fullscreen application, it removes the window border, set the needed screen resolution and set the window size to screen resolution to the position 0,0 of the screen
if the application is not in fullscreen mode then it setup the window border , set the required window resolution and centers the window to the screen.
4. send the window handle to the gesource initialization routines
5. setup the mouse procedures for the window events (i don't use any "setmousepos" procedure so why the mouse shooted up to off-screen is a question ??? )
6. setup the keyboard procedures for the window events
7. loads texture and create mip-maps
8. start rendering.. thats all ???

If someone have a little bit time, pls check the units GS_utils & fenomenon_appwindow if there somewhere a bug because i can't find nothing bad. :(

I try the gdi output and i remove the console window and we will see. :(

As i mentioned ... thnx for your help and support guys... ;)

AthenaOfDelphi
25-07-2010, 04:10 PM
5. setup the mouse procedures for the window events (i don't use any "setmousepos" procedure so why the mouse shooted up to off-screen is a question ??? )


The mouse cursor doesn't shoot off screen through anything your application does. It stays where it was when the application starts.

It shoots off screen when I right click the button in the task bar and select Move. Normally when you select move, windows stuffs the cursor into the center of the window title bar. So, when I select move for your application, the cursor is shifted to the center of the title bar which is off screen. Of course Windows won't allow the cursor to move beyond the desktop boundary, so in my case I end up with the cursor stuffed in the bottom right hand corner of my screen.

Thanks for clarifying the colouring :-)

herrcoolness
25-07-2010, 04:59 PM
I've just tested the two programs included in the RAR on my Toshiba Laptop. It's a Satellite Pro L500 Pro with a 2.1GHz Intel Core2 Duo T6570. It has on board graphics and 3GB of RAM.

It was reporting somewhere in the region of 31FPS. The rendering windows weren't visible until I maximised them. If I tried to move them, the cursor shot off to the bottom right hand corner of my screen as though they were visible but had been positioned by the software to be off the desktop area.

When they were visible, they didn't look right for some reason. Certainly the colour version appeared to be having it's texture corrupted by misaligned RGB values (if that makes sense). The colour version of the program also appeared to terminate when I restored the window to it's original size using the Restore option from the Windows system menu.



Wait for a second. You said that you saw the result of the render window after when you maximized it? ??? ... pls write .. what is your os...

AthenaOfDelphi
25-07-2010, 05:08 PM
I'm running Windows XP Pro (Service pack 3).

herrcoolness
25-07-2010, 05:26 PM
I'm running Windows XP Pro (Service pack 3).


Same system like me. OK maximize-minimize thing was nothing. searching for another solution.
i checked the procedure "fe_refresh_window" ... i added "SetForegroundWindow(fe_app_window)". now i upload it to the sourceforge as "window patch" and if you can , or someone other, check it pls ok? :)

Wizard
26-07-2010, 10:48 AM
I tested your Window Patch but it behaves in the same way as mentioned before.

herrcoolness
28-07-2010, 01:47 PM
Ok guys, i upload new test version's to the sourceforge - "no console" & "no console + low res". Test it and give me the result if the drawing window is showing or not. And give me precise description of your CPU & OS (like WINXP Pro, WINXP Home Edition and so). For example some programs like power cinema works on WinXP Home Edition but not on Pro edition. The output of my program is still DDRAW.

Ok and for faster interaction with you, the testers of my engine, i added ICQ number in my profile, so don't wait, write fast to me as you can.

https://sourceforge.net/projects/phenomenon/files/


Its not possible that i have a so trivial problem like to manipulating a damn window. Its like bad joke. >:(

de_jean_7777
28-07-2010, 03:02 PM
I tried the "window patch-1024x768 - no console - test.rar" test and it worked ok. I tested it on Windows XP Professional SP3, with a Pentium E2220. It's the same PC your previous test did not work on.

However, the other test "window patch - no console - test.rar" did not work properly. It behaved similar to the old tests.

herrcoolness
29-07-2010, 05:23 AM
I tried the "window patch-1024x768 - no console - test.rar" test and it worked ok. I tested it on Windows XP Professional SP3, with a Pentium E2220. It's the same PC your previous test did not work on.

However, the other test "window patch - no console - test.rar" did not work properly. It behaved similar to the old tests.


So when i good understand.. its doesn't matter if the console is on or off. It matter the resolution of the window. The other test is with window of the size 1680x1050. So this can be reason, why it don't work,(or work's but show nothing). How is the resolution of your desktop de_jean_7777 ? If it is smaller then 1680x1050... then we have the solution of the problem why the window is not showing. The frame of the window is out of the screen and direct-draw can't blit to the window because right-down & left-up point is out of screen. How is the resolution of your desktop de_jean_7777 ?

de_jean_7777
29-07-2010, 11:59 AM
How is the resolution of your desktop de_jean_7777 ?


My resolution is either 1024x768 or 1152x854. I never tested on any different resolution.

herrcoolness
29-07-2010, 05:18 PM
How is the resolution of your desktop de_jean_7777 ?


My resolution is either 1024x768 or 1152x854. I never tested on any different resolution.


Thnx de_jean_7777. You helped me to solve the problem. So i will release the later demos in 800x600 window size. And more later i will implement the resolution and fullscreen-desktop-in window switching.

De_jean_7777 .. good work. 8)

herrcoolness
13-08-2010, 02:29 PM
News-news-news guys. :) So i implemented the hierarchical z-buffer with 3 basic funtions, for fast tile skip, standart per pixel z comparing and fast z writing without z comparing to old z values in z buffer.

I uploaded 2 demos. One with colored debug info and one without the coloring to see how it normal works. ..and now just in 800x600 resolution to don't get the old window problems (in old demo's are the rendering window's in 1680x1050 resolution, and that caused no-drawing on computers which have lower resolution on desktop) ::):
*black tiles - skipped tiles of the hidden small quad
*green tiles - tiles drawn with the fast write fucntion (no z comparison) and are not compared against the triangle edges
*cyan tiles - tiles are drawn with fast write function (no z comparison) but compared against the triangle edges
*gray tiles - tiles are drawn with function that compares the z-values agaisnt the z-buffer and are compared against the triangle edges


Next stop ...clipping and transform pipeline ... and first rotated cube? ;)

Luuk van Venrooij
17-08-2010, 08:47 AM
Cool stuff! Wrote my own software renderer a few months back. It wasnt really build for performance but more of a learning project. This seems a bit faster

herrcoolness
18-08-2010, 06:37 AM
Cool stuff! Wrote my own software renderer a few months back. It wasnt really build for performance but more of a learning project. This seems a bit faster


Luuk ... aaaand it will be even faster, because freepascal is using for the vector operations standart procedures with calling convention (push-call-ret instructions) and not inlined procedures, because i use assemebler in overloaded operator procedures. So i need rewrite every vector operation in the triangle procedure in to sse assembler to gain more speed. You can compare for example vector addition: 3 sse instructions in assembler against freepascal's overloaded operator generated procedure with 10 or more x86 and SSE instructions and the procedure will using stack so in result , there are more instructions and it hurts the cache. There are many more optimizations possible. But i make them later. :)

herrcoolness
29-08-2010, 02:56 PM
Ok guys. What's new?
Now the triangle input coordiantes are in NDC (Normalized device coordinates), so x and y postion need to be in +1,-1 interval. Why this? because this are using graphicards and helped me to solve the problem when you change the size of the window. Now the size of of the triangles is changing too and is propotional to the rendering window.

Aaand i added third texture filtering method for low-end pc's. Its almost fast like nearest texture filtering (because of 1 texture fetch) but looks almost like bilinear. Yes-yes you saw this method in Unreal. I found a description about this technique in old flipcode archive on net (http://www.flipcode.com/archives/Texturing_As_In_Unreal.shtml)

There are 2 demos :
-one static to see how fast are all 3 techniques (push 1,2,3 to change the filtering technique)
-and dynamic to see the dither-bilinear technique in action (push 1,2,3 to change the filtering technique)

herrcoolness
19-09-2010, 06:51 AM
News-news-news !! I added full transformation pipeline of vertices and homogeneous clipping of triangles based on direct 3d documentation (like orientations of model and camera in world-space, perspective matrix calculation and so). I added backface-culling of triangles against camera in model space, so some triangles don't go down thru the pipeline and just those vertices are transformed, which are really needed (just for visible triangles).I created small demo where you can move with the camera and see a big cube with texture of size 2048 x2048. About the rasterizer. I reimplemented Nick's rasterizer with fixed-point math because of its numerical stability near edge of the drawing rectangle. Sometimes after clipping and homogeneous division the positions of points of the triangle was going outside of the screen which caused an error in triangle rasterizer.
About the demo;
q,e - moving in y direction
a,d - moving in x direction
w,s - moving in z direction
1,2,3- filtering method
9,0 - vsync on-off

ps: the camera is spawning in the cube ;-)

https://sourceforge.net/projects/phenomenon/

herrcoolness
08-04-2011, 06:47 PM
Ok folks. I have new nick on sourceforge and the name of the project is little different because 2 bad things happened:
1. Some idiot or idiot's atacked the sourceforge server's and the SF team changed the passwords of all projects for the security reasons
2. I used the e-mail recorvery, but my e-mail provider horribly failed because i can't get any new mails, sometimes the page was down... There were many problems.

So i changed the name of the project and my nick. I will now use the SVN system to make updates to the source code.

About the project ... So what is new? Per-tile operations are more optimized, i am using now 2D homogenouse rasterization, clipping is done in 4D homogenouse, vertex transformation path and backface culling is more optimized, texture reading is in post-proccesing pass, indexing pixels to triangles is optimized with bit mask and linked litss (it was before slow because i don't used combination of 64bit mask and one 32-bit pointer but 64 x 32 bit pointer's, so for every pixel one 32bit value alias pointer to the triangle)

In the demo (cube field - 2000 objects):
q,w,e,a,s,d - move in all 6 directions
arrows - rotation of the camera

Next step :
-per tile mip-maping
-bilinear texture fiiltering with one texture fetch - with one "movaps" or "movdqa"

cya ;)

https://sourceforge.net/projects/phenomenonngsw/

For the demo is desktop in 32-bit color mode needed and sse2 instruction support..

code_glitch
08-04-2011, 08:38 PM
Nice stuff, I'lll be sure to check this one out (SVN pun) and read through the code... I have to admit, the SF password change was a right pain for me too. Please SF, don't get hacked again. SVN failed quite badly on that day (I was committing when it went funny and golly it was 'fun')

herrcoolness
08-04-2011, 10:36 PM
Nice stuff, I'lll be sure to check this one out (SVN pun) and read through the code... I have to admit, the SF password change was a right pain for me too. Please SF, don't get hacked again. SVN failed quite badly on that day (I was committing when it went funny and golly it was 'fun')

thnx dude :) . Really i don't understand people, which are attacking free sofware page. I think they are payed hackers from some company. >:(

herrcoolness
16-04-2011, 12:40 PM
Hi folks. I just updated the engine with a per tile mip-mapping.

I was think about one problem. Is the rasterizer really fast? I compared it with the s-buffer technique. The s-buffer can reject almost all pixel in the x-screen resolution but my can just maximum 64 pixels. When all scanlines in the rasterization pipeline of the polygon (triangle) are rejected, the y-loop need to take just 1680 loops when we take that we have y screen resolution of 1680 pixels and a fullscreen triangle, but my rasterizer need ( 1680 div 8 )*( 1050 div 8 ) loops to do which is 27000 so when we compare it is about 27x slower... GOD DAMN !!

What i get when i use scanline based rasterizing with conjunction with s-buffer?
-faster rejecting of pixel in the x-direction
-ability to render in to texture
-more natural memory organization.
-fewer pixels wasted bacuse the tile aligning
-...
Problem? Yes, when i do light calculations. With pixel's sorted in tiles i could create bbox for the tile from all pixel postions. Then when i check this bbox with vertex positions in world-space against a sphere which is the maximal radius of the light, i could reject or accept the whole 64 pixels. But in scanline based rasterization i don't have the ability to do this. Yes there could be a conversion of the pixels but i think it could be slow. So i am thinkink about a horizontal span in the s-buffer as a line segment, which i check against sphere or cone of the light and see how much pixels i need to calculate.

ok people.. see you later.. i think .. much later. But don't forget .. i am working on it.. ;)

herrcoolness
11-06-2011, 10:54 PM
Hi folks. After doing some research,windows-crash-tests and comparing the speed of scanline based rasterization against tile-halfspace rasterization, i come to a result, that scanline rasterization is much slower then the tile rasterization. Why is it so. So at first what techniques i have used. (You know that all this ideas are not from me. I am just a human like you and not a alien master-brain :D ... and it's fair to show you the source of my knowledge, which will help you to understand me and my source code) :

Scanline based:
-Rasterization algorytm is based on Chris Hecker's floating point rasterizer (http://chrishecker.com/Miscellaneous_Technical_Articles) modified for sse and deferred texturing. Guys, if you are new in rasterization, this is the site where can you learn all the basics and tricks.
-Clipping algorytm is from latest source code of NIck's SwShader 0.3.0. (ftp://nic.funet.fi/pub/sci/graphics/packages/swshader-softwire/sw-shader/swShader-0.3.0.zip) Nick, nice trick to shift the clipping planes from -1<x<1 to 0<x<1. This help a lot when you wanna calculate the clipping flags with SSE (see my source code)
-Transformation code is based on article from http://www.cortstratton.org/articles/OptimizingForSSE.php
-The s-buffer idea is based on Paul Nettle's "S-buffer FAQ" (http://www.gamedev.net/page/resources/_/reference/programming/140/algorithms-and-data-structures/s-buffer-faq-r668) and the source code, which helped me to not going thru the hell of coding a s-buffer insert routine is based on Bero's software renderer (http://vserver.rosseaux.net/stuff/BeRoSoftRender.zip) which based on c++ code of The Swine (http://www.luki.webzdarma.cz/eng_07_en.htm)
-The bilinear filtering is based on the Nick' SWshader source code optimized for sse2 by me.

Tile based:
-Rasterization algorytm is based on Nick's article (http://www.devmaster.net/codespotlight/show.php?id=17) extended to 2d homogenous rasterization (http://www.cs.unc.edu/~olano/papers/2dh-tri/) with little help of the source code from the Attila GPU simulator (https://attila.ac.upc.edu/wiki/index.php/Attila_Project), but the normalization code of the edge equation, which helps converting the edge quation from the FPU format to Fixedpoint format is coded by my self
-Calculation of the triangle's 2d screen coverage bounding box is based on the Attila GPU Simulator source code too, which is again based on article "Jim Blinn's Corner: Calculating Screen Coverage"
-Early accept-reject of block-in-triangle idea is based on intels Larabee article (http://software.intel.com/en-us/articles/rasterization-on-larrabee/)
-deferred rendering idea is based on the PowerVR thechnology article (http://www.imgtec.com/factsheets/SDK/PowerVR%20Technology%20Overview.1.0.2e.External.pd f)
-hierarchical Zmin updating with coverage-mask is based on article Two-level hierarchical z-buffer for 3D graphics hardware (http://www.si2lab.org/publications/cnf/chchen_iscas02.pdf)
-Transformation code is based on article from http://www.cortstratton.org/articles/OptimizingForSSE.php

Pros & Cons:

Scanline based:
+fewer calculations by calculating adress of pixel
+fewer wasted pixels for render targets and texture
+rasterization is calculated pixel precise
-variable scanline size so we can not directly unroll a drawing loop
-not cachce friendly representation of render targets and textures for random access and texture size (can't access pixels in y direction without destroying the data in cache, problematic are calculations of dx/dy derivates and
+rejecting more, equal or fewer as 64 pixels with sbuffer
-we need to draw from front to back and from right to left to gain speed of the sbuffer and the rejection of pixels without traveling the linked list segments which hurts the cache
-demo cubefield rendered at 20-25 fps (the cubes are not drawn from front to back, from right to left,just random)


Tile based
-more calculations by calculating adress of pixel (mmx or sse instructions can help)
-more wasted pixels (because we need align the texture and render target data to tiles sizes)
-rasterization is calculated per tiles, and some cpu-cycles are wasted for those pixel, which are not in triangle (sse can help here to reduce it)
+constant drawing loop which can be unrolled because of the constant size of the tile
+ more cache friendly representation of render targets and textures for random access and texture (we can access the pixels in y direction without destroying the data in the cache, good for pixelshaders, dx/dy derivate calculation,multithreading)
-rejecting just 64 pixels, yes or no, nothing between
+fast accessing the z-buffer thru the x,y coordinates, just comparing higher level which is one floating point number, more better organised structure (grid based)
+demo cubefield rendered at 44 fps and more

The problem with s-buffer is we need sometimes to travel the linked-list of the segments. This is a slow process compared to the hierarchical z-buffer where we have direct access to the memory by calculating the adress from the x,y coordinates. So s-buffer is good for scene without much polygons, because every new polygon adds a new segment to the s-buffer structure and then traveling the growing structure really slowdowns the whole process. See Quake 1. It is using this technique. But in the drawing process the s-buffer is used just for drawing the rooms - static structures which haves small amount of polygons. The mosters,characters are drawn separatly, in another pipeline. But without s-buffer there would be overdraws which can cause more slowdowns.Another problem in scanline based rasterization is the need of clipping. Clipping a polygon in 3d is slow procces and drawing the polygon with the triangle routine is even slower. We need this clipping process for calculating the W coordinates. because we can't use W coordinates behind the camera. And we need to do a perspective division where W coordinate can be equal to 0, which can't be calculated. In tilebased 2d homogenous rasterization we don't have this problem because we directly calculate visible pixels. We even don't need to divide because we are operating in homogenous space. We divide just the interpolated values with the W after checking if the pixel is visible which is that pixel which is lying between the triangle edges and the near and far plane.

Anyway. I uploaded the scanline render for learning purpose. If something is not clear, just ask me. But i think, the FPS numbers are saying it all. Now i know that using the tile render is the right way and i will continue in it. Its deep in the night. I almost see nothing so.. any feedback any question every reaction is welcome. So... let's get back to the TILE WORLD !!!

https://sourceforge.net/projects/phenomenonngsw/files/old%20scanline%20based%20version/

code_glitch
12-06-2011, 08:51 AM
Just wondering, what hardware spec was all this done on? The
demo cubefield rendered at 44 fps and more could use a little more context right? I mean, if that was an i7 and radeon 5770, I might be worried... Or if its an Amstrad/386 then I'd switch like right not :D lol

Dan
12-06-2011, 09:41 AM
I have made a simple software rasterizer some time ago. It's not very well optimized though. Would be interesting to see how it compares to yours.

herrcoolness
12-06-2011, 05:26 PM
Just wondering, what hardware spec was all this done on? The could use a little more context right? I mean, if that was an i7 and radeon 5770, I might be worried... Or if its an Amstrad/386 then I'd switch like right not :D lol

Sorry .. my mistake. So, i have tested the demo on Intel Core 2 Quad Q8300 2.5 Mhz and the demo is using just one thread. .. and of course it's not depend on the graphics card. ;) But i have GeForce GTX 275

herrcoolness
12-06-2011, 05:35 PM
I have made a simple software rasterizer some time ago. It's not very well optimized though. Would be interesting to see how it compares to yours.

Hey Dan. What type of rasterizer it is? Scanline? Tile based? How much polys is in the scene? How is the resolution of the output? Any texture filters? Is there any occlusion procedure or just bruteforce z-buffer?

WILL
12-06-2011, 07:00 PM
This stuff all sounds pretty neat. Would you be able to put together a small demo so we can see whats going on visually?

herrcoolness
14-06-2011, 01:58 PM
This stuff all sounds pretty neat. Would you be able to put together a small demo so we can see whats going on visually?

Hi Will. Just download 2 demos. One is scanline based, another tile based and check the fps, for example at the start. The objects and polygons are not sorted. So the scene is pseudo-randomly rendered. This is nightmare for the rasterizers. But the biggest nightmare is perfektly sorted poly's from back to front = ultimative slowdown. ;)

tile :http://sourceforge.net/projects/phenomenonngsw/files/current%20halfspace%20based%20version/Phenomenon%20engine%2013-06-2011.rar/download
scanline:http://sourceforge.net/projects/phenomenonngsw/files/old%20scanline%20based%20version/Phenomenon%20engine%2012-06-2011.rar/download

herrcoolness
01-07-2011, 09:27 PM
Hi folks. So I updated the engine with new memory representation of texture. The texture is divided in to 8x8 tiles like in old version but now the tiles are bigger - 9x9 pixel. Why ? Because i implemented bilinear filtering and there was small problem with the right and bottom texels in the tile. There when i wanna access the 3 other texels i need to read the color from other tiles and i needed to calculate new adress because of tile change. So now when i am encoding the texture in to the tiles i write the colors in right and bottom extra texels from neighbor tiles. When i am reading the texels now, i am reading they all from one tile and calculating the adresses from other 3 texels is very cheap.

I implemented per triangle occlusion so we calculate the nearest Z coordinate from the triangle to the camera and then comparing it against Zmin from tiles which are in the bounding box of the triangle. Zmin is in separate array as the whole z-buffer so traveling thru the Zmin array is fast. It simulates the on-chip memory rough z-buffer in hardware. When there is one tile which is behind the Zmin from the triangle, the triangle need to be rendered else it can be skipped. Skipping the triangle can save nice amount of computation and the drawing is faster. This method is descripted in article "Method for accelerated triangle occlusion culling" (http://www.freepatentsonline.com/20030043148.pdf), but if you are thinking about the hierarchical z-buffer, you get the idea automaticaly. I've got similar idea, when i was programming the n-level z-buffer.

http://sourceforge.net/projects/phenomenonngsw/files/current%20halfspace%20based%20version/Phenomenon%20engine%2001-07-2011.rar/download

herrcoolness
01-09-2011, 04:49 PM
Ok guys, new update of my project.

Every programmer knows, that the 64-bit environment brings new possibilities, like more memory, more registers, extension of old general register to 64-bite size. As assembler programmer, i've got some problem with some functions because of small amount of cpu-registers. So i need to handle with this problem thru memory accesses to temporary variables and constants, which slows down the whole function. The speed for software rendering is very important so i changed the OS environment to get more horse-power from my cpu.

Changing from 32-bit to 64-bit brings some problems, because now we strictly need to use just 64-bit version of device-drivers. Some old hardware is not supported from his company with a new 64-bit firmware. So we need to buy a new... like me. Now the OS is fully functional.

I changed the version of FPC to 64-bit version too. I needed to rewrite the sourcecode of the render (changing the pointer and pointer operations from dword size to qword size) . I change the output from DirectDraw to GDI, because,i've got problems to run the program under the 64-bit environment. I optimized some parts for the 64 bit because now i have 15 general purpose and 15 xmm registers. Man i was feeling like a kid in toy-shop .. so many registers... :) . I changed some names of the variables like in the gradient calculation for better understanding, what they mean.

I compared the old rectangular traversal algorhytm with the recursive algorhytm, but the old was still faster. Anyway i uploaded the recursive algorhytm http://sourceforge.net/projects/phenomenonngsw/files/current%2064-bit%20version/old%20recursive%20half-space%20version/Phenomenon%20engine%2031-08-2011.rar/download for study reasons. Added some more comments for better understanding the code, like in trivial reject & accept calculations. Cya ;)

http://sourceforge.net/projects/phenomenonngsw/files/current%2064-bit%20version/current%20half-space%20version/Phenomenon%20engine%2001-09-2011.rar/download

pixelwriter
27-01-2012, 06:44 AM
Hi Herrcoolness,

I tested your very fast demo and my plan is to research this type of software-rendering
in the second half of this year, but I already have some initial questions:
Do you know if your rasterizer is faster than http://code.google.com/p/msr-zbethel-tu/
Could you make a pascal-only reference version? The engine would be easier
portable (beyond x86) and modifications would be faster possible without ASM knowledge.

Thank you!