Hi folks. After doing some research,windows-crash-tests and comparing the speed of scanline based rasterization against tile-halfspace rasterization, i come to a result, that scanline rasterization is much slower then the tile rasterization. Why is it so. So at first what techniques i have used. (You know that all this ideas are not from me. I am just a human like you and not a alien master-brain ... and it's fair to show you the source of my knowledge, which will help you to understand me and my source code) :

Scanline based:
-Rasterization algorytm is based on Chris Hecker's floating point rasterizer (http://chrishecker.com/Miscellaneous_Technical_Articles) modified for sse and deferred texturing. Guys, if you are new in rasterization, this is the site where can you learn all the basics and tricks.
-Clipping algorytm is from latest source code of NIck's SwShader 0.3.0. (ftp://nic.funet.fi/pub/sci/graphics/...ader-0.3.0.zip) Nick, nice trick to shift the clipping planes from -1<x<1 to 0<x<1. This help a lot when you wanna calculate the clipping flags with SSE (see my source code)
-Transformation code is based on article from http://www.cortstratton.org/articles...zingForSSE.php
-The s-buffer idea is based on Paul Nettle's "S-buffer FAQ" (http://www.gamedev.net/page/resource...uffer-faq-r668) and the source code, which helped me to not going thru the hell of coding a s-buffer insert routine is based on Bero's software renderer (http://vserver.rosseaux.net/stuff/BeRoSoftRender.zip) which based on c++ code of The Swine (http://www.luki.webzdarma.cz/eng_07_en.htm)
-The bilinear filtering is based on the Nick' SWshader source code optimized for sse2 by me.

Tile based:
-Rasterization algorytm is based on Nick's article (http://www.devmaster.net/codespotlight/show.php?id=17) extended to 2d homogenous rasterization (http://www.cs.unc.edu/~olano/papers/2dh-tri/) with little help of the source code from the Attila GPU simulator (https://attila.ac.upc.edu/wiki/index.php/Attila_Project), but the normalization code of the edge equation, which helps converting the edge quation from the FPU format to Fixedpoint format is coded by my self
-Calculation of the triangle's 2d screen coverage bounding box is based on the Attila GPU Simulator source code too, which is again based on article "Jim Blinn's Corner: Calculating Screen Coverage"
-Early accept-reject of block-in-triangle idea is based on intels Larabee article (http://software.intel.com/en-us/arti...n-on-larrabee/)
-deferred rendering idea is based on the PowerVR thechnology article (http://www.imgtec.com/factsheets/SDK...e.External.pdf)
-hierarchical Zmin updating with coverage-mask is based on article Two-level hierarchical z-buffer for 3D graphics hardware (http://www.si2lab.org/publications/c...en_iscas02.pdf)
-Transformation code is based on article from http://www.cortstratton.org/articles...zingForSSE.php

Pros & Cons:

Scanline based:
+fewer calculations by calculating adress of pixel
+fewer wasted pixels for render targets and texture
+rasterization is calculated pixel precise
-variable scanline size so we can not directly unroll a drawing loop
-not cachce friendly representation of render targets and textures for random access and texture size (can't access pixels in y direction without destroying the data in cache, problematic are calculations of dx/dy derivates and
+rejecting more, equal or fewer as 64 pixels with sbuffer
-we need to draw from front to back and from right to left to gain speed of the sbuffer and the rejection of pixels without traveling the linked list segments which hurts the cache
-demo cubefield rendered at 20-25 fps (the cubes are not drawn from front to back, from right to left,just random)


Tile based
-more calculations by calculating adress of pixel (mmx or sse instructions can help)
-more wasted pixels (because we need align the texture and render target data to tiles sizes)
-rasterization is calculated per tiles, and some cpu-cycles are wasted for those pixel, which are not in triangle (sse can help here to reduce it)
+constant drawing loop which can be unrolled because of the constant size of the tile
+ more cache friendly representation of render targets and textures for random access and texture (we can access the pixels in y direction without destroying the data in the cache, good for pixelshaders, dx/dy derivate calculation,multithreading)
-rejecting just 64 pixels, yes or no, nothing between
+fast accessing the z-buffer thru the x,y coordinates, just comparing higher level which is one floating point number, more better organised structure (grid based)
+demo cubefield rendered at 44 fps and more

The problem with s-buffer is we need sometimes to travel the linked-list of the segments. This is a slow process compared to the hierarchical z-buffer where we have direct access to the memory by calculating the adress from the x,y coordinates. So s-buffer is good for scene without much polygons, because every new polygon adds a new segment to the s-buffer structure and then traveling the growing structure really slowdowns the whole process. See Quake 1. It is using this technique. But in the drawing process the s-buffer is used just for drawing the rooms - static structures which haves small amount of polygons. The mosters,characters are drawn separatly, in another pipeline. But without s-buffer there would be overdraws which can cause more slowdowns.Another problem in scanline based rasterization is the need of clipping. Clipping a polygon in 3d is slow procces and drawing the polygon with the triangle routine is even slower. We need this clipping process for calculating the W coordinates. because we can't use W coordinates behind the camera. And we need to do a perspective division where W coordinate can be equal to 0, which can't be calculated. In tilebased 2d homogenous rasterization we don't have this problem because we directly calculate visible pixels. We even don't need to divide because we are operating in homogenous space. We divide just the interpolated values with the W after checking if the pixel is visible which is that pixel which is lying between the triangle edges and the near and far plane.

Anyway. I uploaded the scanline render for learning purpose. If something is not clear, just ask me. But i think, the FPS numbers are saying it all. Now i know that using the tile render is the right way and i will continue in it. Its deep in the night. I almost see nothing so.. any feedback any question every reaction is welcome. So... let's get back to the TILE WORLD !!!

https://sourceforge.net/projects/phe...sed%20version/