PDA

View Full Version : Radix Sort Class

ggs
07-08-2003, 05:08 PM
Here is my Radix Sort class that I've written.

The Radix sort (Sorts Least Significant Byte 1st) is a O( N ) bin sort. It is temporally coherent, independant of the data to sort, no worse case scenarios. However it requires extra memory.

It can handles the following data types: longword, longint & single. However it does not support arbitary data length. Changing the sorted data length shouldnt be too hard(but it has to be fixed)

How I've implemented it is O( k*N ), k=1,2-5 (k=1 when data is sorted, k=2-5 depending how how ordered the bytes are). Depending on the sort type (it supports actually outputing the sorted data to a seperate buffer or generating a list of indices which sort the data) it require O( N ) or O( 2*N ) extra memory.

Included is a quicksort implementation which is used to verify the integrity of the sort and as a time comparison. Only with ultra small data sets does quicksort beat radix sort.

At some point I'll make a post explaining the theory behind it.

ggs
08-08-2003, 06:27 AM
The radix sort is a bin based sort. This means it doesnt do comparisons of each element, and thus can and probable will run under the O(N*log N) lower limit for comparison based sorts. Bin based sorts work best on fixed length data. A bin sort works by creating an array of bins. 1 bin for each unique element. Then using that element as an index into the bin array, you append the key to that bin.

ElementCount&#58; Number of unique elements in the key
source&#58; List of keys
dest&#91; ElementCount &#93;&#58; ElementCount lists of keys. each list should have
enough space to hold source.length elements.

a pseudo-code would sort the list this way&#58;

for i = 0 to source.length-1 do
dest&#91; source&#91;i&#93; &#93;.append&#40; source&#91;i&#93; &#41;

As you can see this is very very fast(it just moves data around), but incredible memory intensive. It requires Number of Elements * length of source data * size of each key memory. So to sort:
100 bytes would require: 2^8 * 100 = 24 kb
10k bytes would require: 2^8 * 10000 = 2.4 mb
100 longwords would require: 2^32 * 100 * 4 = 1.56 gigabytes
10k longwords would require: 2^32 * 10000 * 4 = 156 petabytes

As you can see it requires a vast amount of memory, enough that even sorting a small number (100) 32 bit numbers just isnt going to work. Heck, sorting 100 bytes is prohibitly expensive, memory wise. But there is away to make it usable. One of the bigger problems is it requires n * SourceData.length buffers to sort the bins in. This is a waste since at the very most only 1 bin can ever be full at any time.

so we need to count the number of elements we have.

Assuming we are dealing with bytes:

var
Histogram &#58; array &#91;0..255&#93; of integer;
...
// Initialize the histogram to all zeros
fillchar&#40;Histogram&#91;0&#93;, sizeof&#40;Histogram&#41;, 0 &#41;;
// Read through the source data & count the elements & store them in the histogram
for index &#58;= 0 to length&#40;source&#41; do
inc&#40; Histogram&#91; source&#91;index&#93; &#93; &#41;;

This gives use a table which tells us exactly how much memory we need. Which is exactly the length of the source data. So instead of allocating lots of little buffers, we allocate 1 big buffer & output all the data to there. This is probable the best we can do to reduce memory requirments. But now we need some way to tell us were to move the data. We need some type of offset table:

var
OffsetTable &#58; array &#91;0..255&#93; of integer;
...
OffsetTable&#91;0&#93; &#58;= 0;
for index &#58;= 1 to 255 do
OffsetTable&#91; index &#93; &#58;= OffsetTable&#91; index-1 &#93; +Histogram&#91; index-1 &#93;;

This creates a table which allocates a block of space in the dest buffer for each byte. Bytes which doent exist in the source data dont get a unique location, but since they dont exist in the source data those offsets should not be touched. However to insure each byte that does get moved gets a unique spot, we need to update the offsettable. The easest way, is todo that as we are moving data.

Now the code to actually move the data in order:

for index &#58;= 0 to 255 do
begin
// move the data
dest&#91; offsetTable&#91; Source&#91; index &#93; &#93; &#93; &#58;= Source&#91; index &#93;;
// update the offset table, so the next byte that is equal to
// the current byte gets a unique location
inc&#40; offsetTable&#91; Source&#91; index &#93; &#93; &#41;;
end;

But this is only for an array of bytes, and it isnt practical to extend to offset table & histogram to accommodate the larget basic types. However, we are sorting via the most fundementel elements a byte. Wouldnt it be good todo this on the larger types since they are just a collection of bytes? The answer is yes you can.

A basic radix sort to sort an array of longwords (4 bytes, 32 bits unsigned):

Type
TArrayOfLongword = Array of longword;

Procedure byteRadixSort(ShiftCount : integer; source, Dest : TarrayOfLongword);
var
Histogram : array [0..255] of integer;
OffsetTable : array [0..255] of integer;
Index : integer;
Value, ValueToIndex : Longword;
begin
// biuld the histogram
fillchar(Histogram[0], sizeof(Histogram), 0 );
for index := 0 to length(source) do
begin
Value := source[index];
ValueToIndex := ( Value shr ShiftCount ) and \$FF;
inc( Histogram[ ValueToIndex ] );
end;
// biuld the offsettable
OffsetTable[0] := 0;
for index := 1 to 255 do
OffsetTable[index] := OffsetTable[index-1] + Histogram[index-1];
// do the inner loop of the radix sort
for index := 0 to length(source) do
begin
Value := source[index];
ValueToIndex := ( value shr ShiftCount ) and \$FF;
dest[ OffsetTable[ ValueToIndex ] ] := Value;
inc( OffsetTable[ ValueToIndex ] );
end;

var
tmp : TarrayOfLongword;
begin
if length(source) < 2 then exit;
// create the tmp buffer
setlenght(tmp, length(source));
// sort based on the 1st byte
// sort based on the 2nd byte
// sort based on the 3rd byte
// sort based on the 4th byte

Since the byteRadicSort method only sorts based on byte values, we need to extract the byte we are going to manipulate. We take advantage of the fact you can sort in multipule passes instead of 1 single pass.

This can be optimized significatly. It currently does 10 reads & 4 writes of the source data, is there any way to improve it? The answer is yes. All the histograms & offset tables can be calculated at once. A seperate offset table array isnt even needed, you can transform the histrogram table into the offset table in the same memory space. This alone drops the number of reads down to 5! And since the main limitation is how fast you can read the memory in, you might as well do some calculations. Thus in the loop which biulds all the histograms, it can detect when a pass is not needed and when the data is sorted! This will cut 2-4 reads of the source data off and 1-4 writes off! Another optimiztaion is to hint to the processor to preload the source data into the cache during the histogram biulding & the actual inner loop. This will drastically reduce cache misses during reads, which will help a lot.

cairnswm
08-08-2003, 07:27 AM
I downloaded your example file and it took me about 5 minutes to get it working. There are a couple of errors in your code - Procedures that arn;t declared properly as well as redeclarations of other procedures.

I'd suggest putting in a read me file on how to get the program working as well. I't took me a while to realize that the results were only put into a text file.

What about creating a demo that allows me to enter a lot of integers into a listbox or memo that can then be sorted when I press a button.

ggs
08-08-2003, 01:54 PM
Oops. I had some left overs from experimental code I was testing.

:edit:
I've uploaded a new version. I've written a readme & cleanup the code (now should compile).

Also reduced the test case to something which shouldnt take a while.