Handling huge amounts of data

**cairnswm** · 09-01-2007, 04:41 AM

You might want to look at one of Jans Delphi Components (http://jansfreeware.com/jfdelphi.htm)

TjanSQL 1.1
2-April-2002 size:379kb
TjanSQL is a single user relational Database engine implemented as a Delphi object using plain text files with semi-colon separated data for data storage. Supported SQL: SELECT (with table joins, field aliases and calculated), UPDATE, INSERT (values and sub-select), DELETE, CREATE TABLE, DROP TABLE, ALTER TABLE, CONNECT TO, COMMIT, WHERE (rich bracketed expression), IN (list or sub query), GROUP BY, HAVING, ORDER BY ( ASC, DESC), nested sub queries, statistics (COUNT, SUM, AVG, MAX, MIN), operators (+,-,*,/, and, or,>,>=,<,<=,=,<>,Like), functions (UPPER, LOWER, TRIM, LEFT, MID, RIGHT, LEN, FIX, SOUNDEX, SQR, SQRT). High performance: complete in-memory handling of tables and recordsets; semi-compiled expressions. Released under MOZILLA PUBLIC LICENSE Version 1.1. NEW FEATURES: fixed memory leak, calculated fields (in select and update statements), field aliases, table aliases, join "unlimited" tables, stdDev aggregate function, ASSIGN TO for named temporary tables, SAVE TABLE for persisting recordsets, INSERT INTO, ISO 8601 dates, numerous extra functions.

I have never used it but have always sort of kept it in mind for one day when I want a SQL based text file system.

**Paizo** · 09-01-2007, 09:25 AM

ty for support guys

i forgot to say that cards are around 6000+ (growing)

ops:

so i can't store them all in a collection. I will use collections only for builded decks and so i have to load the text from -somewhere- (ie a text file, .xls or others) and the images from a folder.

you can undestand that looking for a specific card's text in a file of 6000 cards for a deck >59 (maybe all unique) is really lagged! That's why i asked for sql queries.

i will take a look to the link above, ty again.

**cairnswm** · 09-01-2007, 10:11 AM

SQL by its nature will always be SLOWER than a custom created binary search. If you take the time to think about it, each SQL query must be parsed for correctness, then 'interpreted' [size=9px](1)[/size] before accessing the data. The data itself may be fragmented accross multiple disk segments etc and will not be the fastest option for single record type searches.

A binary file using the standard Delphi record structures will always be faster unless it does tooo many disk accesses to find the data.

Code:

Type
  TCardRecord = Record
     CardName &#58; String&#91;30&#93;;
     CardText &#58; String&#91;255&#93;;
     ....
   End;

Var
  CardFile &#58; File of TCardRecord;

Then ensure the file is stored in sorted order, and do a binary search accross the file. (If you want an example just ask).

Or alternativly include a hash along with the card index into an index file. Hash the card name and then search in the index file to get the card index and access it directly from the Card File.

Lastly - if you want to show off - create the card file and then index the card name along with the card index into a B+ tree structure as an index. This will be fast, possibly faster than the hash table idea but while I've always wanted to make a B+ tree structure I never have.

[size=9px](1) I say interpreted but it could be compiled or similar as well.[/size]

**Paizo** · 09-01-2007, 11:28 AM

ok, but for make a binary search on a file mean that i have to load it all into the ram before; and at the moment i can't use a record as you posted since card's text hasn't a fixed lenght in the .xls that i have.
(if you know a way to save fields with a fixed lenght from excel tell me)
otherwise i have to read and parse every row :/ or i should make a prog for parse and than convert it as i want

edited: I have corrected the post, my english sucks!

**cairnswm** · 09-01-2007, 01:32 PM

You can do a binary search using the FileSeek and FilePos functions and do it against the disk instead of in memory.

If I remember I'll do a little example for you tomorrow - need to go homw and takes kids out now.

**tanffn** · 09-01-2007, 01:50 PM

I‚Äôm sorry for crashing in so late to the conversation. From what I understood you want to store and search a ‚Äúhuge‚Äù amount of data, so much data that you can‚Äôt load it all into memory.
Therefore you need to store the information sorted with an indexing table, there are many methods to implement that for example:
You need to use 2 files (or more), data file abd index file(s).
* Upon adding a new element calculate his hash (based on the things you‚Äôre going to look for) and add it to the data file (saving the position).
* Save the position + hash in the index file.
In runtime you only need to load the hash file (which is small). When you need to find an element you calculate the hash and jump to the correct location in the data file (it also works if the hash isn‚Äôt a unique ID)

Delete is a small problem as you will encounter fragmentation.. but it can be solved.

I hope im answering the right thing :?

Goodluck

**AthenaOfDelphi** · 09-01-2007, 08:53 PM

[quote="Paizo"]ty for support guys

i forgot to say that cards are around 6000+ (growing)

ops:

so i can't store them all in a collection. I will use collections only for builded decks and so i have to load the text from -somewhere- (ie a text file, .xls or others) and the images from a folder.

you can undestand that looking for a specific card's text in a file of 6000 cards for a deck >59 (maybe all unique) is really lagged! That's why i asked for sql queries.

i will take a look to the ]

With regards to the speed... I have just done a quick test... I populated a string list with 10000 random strings... each one was 10 characters long, and then ran through the list looking for ABCDEFGHIJ. I also searched each string using pos() for the substr 'AB'. The whole process (population and running through the list) took less than 200ms. Thats on an Athlon 800.

Searching for strings etc. on in-memory data can be exceedingly fast.

As for memory requirements... if you say you are going to limit yourself to just 20MB, then each of your 6000 cards can provide 3KB of data and you wouldn't blow your 20MB limit. If a card does provide that much data, can it be optimised once loaded? Link images to cards via integer (4 bytes as opposed to a string which could be anything) for example, that could save a whack of data.

**Paizo** · 10-01-2007, 09:19 AM

Originally Posted by AthenaOfDelphi

...
Searching for strings etc. on in-memory data can be exceedingly fast.
....

I agree.
I think that save some ram space (1mb?) and make the search in the disk isn't good. maybe in future implementation the app will offer the opportunity to make some query on the database and, looking at Athena's test, seems without lag.

**cairnswm** · 10-01-2007, 10:05 AM

In my above post I ment Seek not FileSeek.

Here is an example of a custom data structure used to store data - along with the binary search funtion - note the data must be inserted in alphabetical order.

I cannot get times for the search - I've tried files up to 200000 records and all searches give me 0 millisend response times....

[pascal]unit QuickDB;

interface

Uses
SysUtils;

Type
TDataRecord = Record
Name : String[100];
Data : Array[1..4] of String [255];
End;

Procedure MakeData(FileName : String; NumOfRecord : Integer);
Function GetRecord(FileName : String; inName : String) : TDataRecord;

implementation

Procedure MakeData(FileName : String; NumOfRecord : Integer);
Var
I : Integer;
F : File of TDataRecord;
DR : TDataRecord;
Begin
AssignFile(F,FileName);
Rewrite(F);
For I := 0 to NumOfRecord-1 do
Begin
DR.Name := 'Rec'+FormatFloat('000000000',I);
DR.Data[1] := 'Data1';
DR.Data[2] := 'Data2';
DR.Data[3] := 'Data3';
DR.Data[4] := 'Data4';
Write(F,DR);
End;
CloseFile(F);
End;
Function GetRecord(FileName : String; inName : String) : TDataRecord;
Var
I,H,L,M : Integer;
F : File of TDataRecord;
DR : TDataRecord;
Begin
L := 0;
AssignFile(F,FileName);
Reset(F);
H := FileSize(F)-1;
M := (H+L) div 2;
Seek(F,M);
Read(F,DR);
While (DR.Name <> inName) and (L<>H) do
Begin
if DR.Name > inName then
Begin
H := M-1;
End
else
Begin
L := M+1;
End;
M := (H+L) div 2;
Seek(F,M);
Read(F,DR);
End;
CloseFile(F);
Result := DR;
End;

end.
[/pascal]

**Paizo** · 10-01-2007, 10:22 AM

i appreciate your way to solve problems by posting some code

i will make some test soon.

Moderation Process Reminder

Thread: Handling huge amounts of data

Thread Tools

Display