Problems with a chunk-based file

**noeska** · 13-03-2009, 07:15 PM

Originally Posted by chronozphere

Could you not store the record structure inside a stream/file that is also stored in the blocks. E.g. the record structure is just another file inside the archive. No interference that way.

That is an interesting idea, but i would implement it in a slightly other way. If you use the existing file-system to store filesystem-data you will not be able to retrieve this data because it's stored in the system itsself, and you need this meta-data to tell in which file the meta-data is stored.. It's an endless circle isn't it. 

....

You are simply encapsulating your filesystem into another blocksystem. Doing it this way allows you to keep adding data to the end of your file without having to move data around to allow parts of the file to grow. This system allows for fragmentation and asks for a defragmentation-method(). You also need to scan all blocks at startup to be able to enumerate all files that exist inside the VFS.
If the added complexity and work is no problem for, you could try this method.

The circle got a a starting point on by having the 'file' index always start in block 2. Leaving block 1 for the header. Now the 'file' index can be used to retrieve meta data and data tables and data indexes also stored as other files inside the chunked file. The 'file' index can be considered an data table also.

Have a look at my vfs: http://www.noeska.net/projects/nvfs/
and look at
procedure TVirtualFileSystem.GetFileList(aname: string; List: TStrings);
that uses FDir: TVirtualFileStream;
what is a filestream inside the chunked file that starts in block2.
it can be loaded in a stream by name '/' because
function TVirtualFileSystem.FindFile(aname: string; var afilerecord: TFileInfo): int64;
returns the first record in block 1:
self.ReadBlock(1,FileRecordEntry,0,SizeOf(TFileInf o));
and yes this record is fixed size :-) variable sized records are somewhere low on my todo list.
as filename: string[255];
and yes nvfs potentialy needs to be defragmented after deleting files and making files larger/smaller.
But i have no need to enumerate al files at startup. Al i need to do when reading a file is look up its name in '/' that way i retrieve the starting chunk and each chunk knows the follow up chunk for a file.

**Brainer** · 14-03-2009, 06:00 AM

Thanks for all your ideas.

But they don't seem to resolve my problem.

The block structure I wanna use is:
[pascal]
type
{ .: TBlock :. }
TBlock = packed record
Flag: Byte;
NextBlock: Cardinal; // used when reading files contents
end;
[/pascal]

The problem is not how the structure would look like, but how to implement the file format. What I mean is that I've not a slightest clue how to manage a file structure and file data inside one file. Also, as I want my format to support directory trees (what's important - a directory can contain files and subdirectories).

That's what the problem really is, nevertheless the idea of blocks proposed by chronozphere suits me.

And yes, sorry for me being not clear from the very beginning.

**noeska** · 14-03-2009, 12:53 PM

It is not easy.

Dont try to do all at once. But try to make chunks/blocks work with a directory first. Then consider your directory to be a file that starts in block 1. And your header to be block 0. Then think on subdirectories. Oh and try to avoid loading to much in memory the vfs is leading not internal memory.

Also did not you have something working before? Try to adapt that? You said it had problems when lots of files were added?

Also i should make i better example for nvfs to show it of better. But that wont be until next week.

**ize** · 14-03-2009, 06:15 PM

noeska's right. Get your header/chunk code sorted and after that directories and sub-directories are easy. Just specify the virtual path and filename when adding the file.

Eg: File c:\example.txt can be stored as df0:\dir1\subdir1\example.txt in the file system(with df0: being the virtual root drive)

All you need to do is store that filename and the directories can be worked out from that, not affecting how you save the chunk data. Of course, this doesn't take into account empty directories because i can't see the point of storing them

Here's some code. It's from a file packer i started to write a little while back but didn't finish(the packing worked, i couldn't get my head round compressing the data).

edit: found this Compression Library which was exactly what i was looking for. I'm sure you guys know all about it

I think i'll have another go at finishing my project now.
[pascal]
const
rootdir = 'df0:';

{add a trailing character C to a string S}
function TrailC(S: string;C: Char): string;
begin
if (Length(s)>0) and (s[Length(s)]<>c) then S:=S+C;
Result:=S;
end;

{split a filename into separate parts. Return the path or the file depending on dir(true for directory). default returns path}
function split(f0: string; dir: boolean): string;
var
k: integer;
begin
// no path specified
if (pos('\', f0)=0) then begin
if dir then begin
result:=trailc(rootdir, '\');
end
else if not dir then result:=f0;
exit;
end;

// find last path separator
k:=length(f0);
while (f0[k]<>'\') and (k>0) do dec(k);
if dir then begin
result:=copy(f0, 1, k);
if length(result)=1 then result:=trailc(rootdir, '\'); // convert '\' to 'df0:\'
if (result[1]='\') then insert(rootdir, result, 1) // convert '\directory1' to 'df0:\directory1'
else if (pos(rootdir,result)=0) then insert(trailc(rootdir, '\'), result, 1); // force rootdir into filename
end
else result:=copy(f0, k+1, length(f0));
end;

{string satisfy function. allows widcard matches of a string S using Mask like *.txt or file??.txt. Returns True on success}
function StrSatisfy(S,Mask: PChar) : Boolean;
label
next_char;
begin
next_char:
Result:=True;
if (S^=#0) and (Mask^=#0) then exit;
if (Mask^='*') and (Mask[1]=#0) then exit;
if S^=#0 then begin
while Mask^ = '*' do Inc(Mask);
Result:=Mask^=#0;
exit;
end;

Result:=False;
if Mask^=#0 then exit;
if Mask^='?' then begin
Inc(S);Inc(Mask);goto next_char;
end;
if Mask^='*' then begin
Inc( Mask );
while S^<>#0 do begin
Result:=StrSatisfy(S,Mask);
if Result then exit;
Inc(S);
end;
exit; // (Result = False)
end;
Result:=S^=Mask^;
Inc(S);Inc(Mask);
if Result then goto next_char;
end;

{enumerate all files using mask. findexes was the array for my file index}
function enumfiles(mask: string): tstrings;
var
f, p: string;
i: tindex;
begin
result:=nil;
if (trim(mask)='') or (length(findexes)=0) then exit;
result:=tstringlist.create;
p:=split(mask);
f:=split(mask, false);
for i in findexes do
if (split(i.filename)=p) and (strsatisfy(pchar(split(i.filename,false)),pchar(f ))) then result.add(i.filename);
end;

{enumerate all folders using Mask}
function enumfolders(mask: string): tstrings;
var
k: integer;
p: string;
i: tindex;
begin
result:=nil;
if (trim(mask)='') or (length(findexes)=0) then exit;
result:=tstringlist.create;
for i in findexes do begin
p:=split(i.filename, true);
p:=copy(p,1,length(p)-1); // remove the last '\' from filepath
k:=result.indexof(p); // only add new directories
if (strsatisfy(pchar(p), pchar(mask))) and (k=-1) then result.add(p);
end;
end;
[/pascal]

So given 5 files(with df0:\ being the root directory):
df0:\a.txt
df0:\directory1\b.txt
df0:\directory1\c.txt
df0:\directory1\directory1\d.txt <- same named sub-dirs are allowed
df0:\directory2\e.txt

enumfiles('*') = df0:\a.txt (if you wanted all files, just read the index - it would be easier and quicker)
enumfiles('\directory1\*') = df0:\directory1\b.txt & df0:\directory1\c.txt
enumfolders('directory1\*') = directory1