PDA

View Full Version : My own archive format



Brainer
21-10-2008, 05:59 PM
Hello there. :)

I've been struggling with this for a week and I haven't worked it out yet. What I mean is my own archive file format. It's not that hard if you just want to store files without directories - I've done it once before. Now I'd like to extend its possibilities and add directories support.

My question is - what's the best way of implementing them? I mean, I can easily make appropiate structures in Pascal and save them to a stream as a tree. But I have no idea how to properly calculate offsets. :S Any ideas?

Or maybe you can put me on the right track and propose me another solution?

I'd appreciate any ideas, replies.

noeska
21-10-2008, 06:29 PM
I am using degisy tdatafile: http://www.torry.net/quicksearchd.php?String=datafile&Title=Yes

Using sections you have a flat directory structure. (and you can always cheat using / in the section name :-) )
I modified it a bit to support lha compression also.

The only problem is that when deleting a file the archive size does not become smaller. Only copying all files to a new archive helps for that.

Currently i am thinking of moving to sqlite for this purpose. Yes using a database for file storage :-)

Brainer
21-10-2008, 06:41 PM
I thought of making it on my own, it's like a challenge. But thanks for the component, I'll take a look at it.

Any idea how it's done by WinRAR? I also thought of making an XML file and putting it inside the archive. It's easy to maintain the structure, but it's still a pain to calculate the offsets. :S

arthurprs
21-10-2008, 07:14 PM
thats how i do

ps: it will be nice to sort the order of the files by their folders, but it's not necessary

generate the xml
-> start the offset at 0 and sum it with the size of previous file entry

file structure:
-> array of char[0..3] with your header
-> uint32 with the size of the xml
-> the xml with offsets
-> store the files

-----------------------------

when opening, read the header
if header is not good abort

read the size of the structure xml, then read and parse the xml

for acessing a file, sum ( 8 + xmlsize ) to the offset described on the xml,

or seek from the end of xml file with the offset described on the xml


there are other possible structures, like storing the xml on the end of the file etc..

Brainer
21-10-2008, 07:19 PM
Yip, that's what I've been thinking of, too, but I don't quite get the idea of calculating the offsets. And what do you mean by:


start the offset at 0 and sum it with the size of previous file entry

Does that mean I should generate an XML for each file? Am I getting it wrong?

arthurprs
21-10-2008, 09:10 PM
Yip, that's what I've been thinking of, too, but I don't quite get the idea of calculating the offsets. And what do you mean by:


start the offset at 0 and sum it with the size of previous file entry

Does that mean I should generate an XML for each file? Am I getting it wrong?

no, something like


offset = 0

for entry in files:
wirtetoxml(entry.name,entry.size,offset);
offset += entry.size

Robert Kosek
21-10-2008, 10:59 PM
I wrote my own archive format, so let me give you some advice. If you are in need of any specific pointers, just ask away and I'll chip in some more advice.

The Basics

You need: A header record. File list records. A way to save file names easily.

So, you must decide if this is the kind of archive you regenerate every time you add files, and therefore change internal orders every time, or if you need to be able to add new files. If you regenerate it every time you can sort the records and file names, and so use a binary search to find the items.

How to Save Directories

Well, the obvious way is to use a recursive search. If you do this, then you know the source directory and can easily transform the absolute filenames you get into relative ones. Obviously this doesn't include how you store these directories internally. Frankly, the easiest way is to make the filename "my dir\file.txt" within the data structure, but it isn't as elegant as something like WinRAR or WinZIP.

Here's a fast recursive search I wrote as an example:
procedure TIAWriter.AddFiles(const directory: string; Options: TAddFilesOptions
= [afoRecurse, afoIgnoreHidden]);
var
SearchRec: TSearchRec;
Dir: string;

procedure SearchSubDir(const sub: string);
var SearchRec2: TSearchRec;
temp: string;
begin
if FindFirst(dir+sub+'\*.*',faAnyFile,SearchRec2) = 0 then
repeat
if (afoIgnoreHidden in Options) and (SearchRec2.Attr and faHidden > 0) then
Continue;
temp := IncludeTrailingPathDelimiter(sub) + SearchRec2.Name;
if &#40;SearchRec2.Name <> '.'&#41; and &#40;SearchRec2.Name <> '..'&#41; then
if &#40;SearchRec2.Attr and faDirectory = 0&#41; then
AddFile&#40;dir + temp, temp, afoEncrypt in Options&#41;
else
SearchSubDir&#40;temp&#41;;
until FindNext&#40;SearchRec2&#41; <> 0;
end;

begin
fHeader.arcEncrypted &#58;= afoEncrypt in Options;
dir &#58;= IncludeTrailingPathDelimiter&#40;directory&#41;;
if FindFirst&#40;dir+'*.*',faAnyFile,SearchRec&#41; = 0 then
repeat
if &#40;afoIgnoreHidden in Options&#41; and &#40;SearchRec.Attr and faHidden > 0&#41; then
Continue;
if &#40;SearchRec.Name <> '.'&#41; and &#40;SearchRec.Name <> '..'&#41; then
if &#40;SearchRec.Attr and faDirectory = 0&#41; then
AddFile&#40;dir + SearchRec.Name,SearchRec.Name,afoEncrypt in Options&#41;
else if afoRecurse in Options then
SearchSubDir&#40;SearchRec.Name&#41;;
until FindNext&#40;SearchRec&#41; <> 0;
FindClose&#40;SearchRec&#41;;
end;

Cheating in the file records and offsets...

This is really easy, and I'm glad I figured it out early when writing my format.

The trick is to structure the archive like so:
HEADER
----------------
FILES
----------------
FILE RECORDS


The second trick is to assemble the archive procedurally. Create your stream, write a blank header, and then compress each file and write it to the stream. Of course you should be making the file list the whole time, and storing the compressed length plus the file offset in the stream (before you write the file to the stream). Then when you're done writing the files write the position to the header's file list offset. Write the file list, seek to the beginning and write your real header ... and close.

It's quite simple really. If it's hard to follow I'll give you a numbered list. ;)

Misc. Tips

Don't: Make the archive solid, because all seek orders you do will require the archive to be decompressed every time. All additional writes will take slightly more time each time as the archive becomes large. Be indecisive about your requirements, they are excruciatingly difficult to factor in to your code if they involve a major methodology change; sometimes a full rewrite if you're sloppy.

Edit: Forgot about the code mangling if HTML wasn't disabled ... so I fixed that and disabled it.

Brainer
22-10-2008, 05:09 PM
Robert, I had an identical structure on my mind like yours:


HEADER
----------------
FILES
----------------
FILE RECORDS

But how to store directories using this structure and what about the offsets? :S

noeska
22-10-2008, 08:22 PM
You should know the size of the element/file you are going to add to the archive. Using that you can calculate where the file is inside the archive.

e.g. if the header size is 10 bytes. file1 is 100 bytes and file3 is 120 bytes you can add a file at 10+100+120 or put the list of files with sizes there.

Also if you want to read the second file you know it is at 10+100.

Also in the header you should store the total size of files so you know where the file list is (FILE_RECORD).

folders/directories are easy as they are not existing files. Enumerate them and you can refer them in the file record.

example of possible file record (id, name, size, folderid)
1 file1 100 3
2 file2 120 3
3 folder1 0
4 folder2 3

Brainer
22-10-2008, 08:46 PM
Thanks noeska, it dissipated a few doubts. :)

I'll give my best shot at it and post the results so that you can test it. If anyone is interested, the first version of my archive format can be downloaded from there: http://uploadingit.com/files/738520_pqpnc/FileManager_%5B19072008%5D.zip.

I'd like to hear some feedback. :)

Robert Kosek
23-10-2008, 01:54 PM
I'll check your format out shortly, I was just writing this for some general advice plus some code examples.

You should know where to put the archive is always at the end, so you needn't tally a thing. This should be programmatic, rather than arithmetic.

Archive creation:
//Write the header at position 0
//Write the compressed files to the stream
Header.FileListOffset = Stream.Position;
//Write the array of records to the stream, the file list
//Refresh the header with the actual file list offset
Stream.Seek&#40;0, soFromBeginning&#41;;
Stream.Write&#40;Header, sizeof&#40;header&#41;&#41;;
Stream.Free;
end;

Archive update:
Stream.Position &#58;= Header.FileListOffset;
//Add new compressed files&#58; this overwrites the header
Header.FileListOffset = Stream.Position;
//Write the array of records to the stream, the file list
//Refresh the header with the actual file list offset
Stream.Seek&#40;0, soFromBeginning&#41;;
Stream.Write&#40;Header, sizeof&#40;header&#41;&#41;;
Stream.Free;
end

My archive file records always have these elements:
FileRecord = record
// It could be anywhere, and an int64 is just a wise choice here.
start&#58; Int64;
// Compressed file size &#40;because you can read until there's no input,
// you don't need an uncompressed length field.&#41;
len&#58; Longword;
// And this says whether I used BZip or ZLib, which you can omit.
bzipped&#58; boolean;
end;

To save an array of these to the stream is quite simple:
Stream.Write&#40;files&#91;0&#93;, Count * SizeOf&#40;FileRecord&#41;&#41;;
To read:
SetLength&#40;files,fCount&#41;; // store the file count in the header!
Stream.Read&#40;files&#91;0&#93;,SizeOf&#40;FileRecord&#41;*fCount&#41;;

People like to say this saves the metadata for a dynamic array, but I disagree. The main reason I disagree is that I factor the size of only the elements of the array, and not its meta data; because I write from element 0 to the length of the elements, it acts like an extraction of the elements from the metadata. At least, so far as I know. ;)

If you want a short string, 255 characters worth, as your ID then just add "id: shortstring" to it. Pascal can easily handle this data's transference to the stream. Now, if you want longer file names ... you must use a stringlist and treat both the list plus your array of records as an indexed list, and only change positions of records and names together. (You can even compress the file list for better compression ratios.)

Directories, in my format, are just a part of the filename within the archive. Thus if I want "my dir/my file.txt" I tell it to extract just that. Why worry about virtual folders? That's more trouble than it is worth. If you want more fancy folder enumeration stuff, you could build that into your file list class.

noeska
23-10-2008, 04:51 PM
ugh gnu gpl :-( i dont like it, i prefer mpl.
but its your choice to use it.

Also does not that filetypes part complicate things? It is not that an mp3 file is written differently then a jpeg?

I see you have not gotten to implementing (virtual)folders?

What worries me is the delete routine. Does it move al files after the deleted file over the deleted ones position? Could get slow if the file becomes larger. Also if the deleted file/one is 10 bytes and the next is 100 bytes you end up reading and writing within the next 100bytes file. Should work, but i dont like the idea of it.

yet another method:
if you want to make it really dynamic you should implement something like a fat file structure. E.g. make the stream consist of 32bytes blocks with pointer to previous and next block. So a file can end up taking up blocks 1,2,3 and 4 and another file can take up blocks 5.6 and 7. even the file list and folder structure could take up block 8. on adding a file that one could take up 9,10 if block 8 becomes to small to hold filelist and folder it could extend to block 11. etc. This means you can wast some space as not all blocks are competely filled. Blocks 4,7,10 could contain less data then 32bytes. Also on deleting the second file block 5,6,7 can be marked as deleted. So on a next file insert these block can be used again. Yes and before you know it you need to write a defrag tool :-)

Brainer
23-10-2008, 04:55 PM
noeska I actually don't care about the licence, you can do anything you like with it. :)

You see, it was my first attempt at archive formats, so I guess it could have errors and many things could have been done better.

Regarding deletion, what's the best option to do it?

noeska
23-10-2008, 05:03 PM
there is no best way to do deletion :-) That just depends on the usage of the archive file. Also note my fat addition to my previous post.

Brainer
23-10-2008, 05:17 PM
Man, I don't want to implement a VFS, just an archive format. xD
But you know, once I have my archive created, I wouldn't need to delete files. Imagine a game that deletes its own data. :)

Brainer
25-10-2008, 10:16 PM
Another question. How should I load file data? I guess it shouldn't be loaded at once into memory, because an archive could be 1 GB. Any suggestions? :?

noeska
26-10-2008, 10:25 AM
Use tfilestream?

Have a look at this for the basics: http://dn.codegear.com/article/26416

So .position if your friend here :-)

together with

function Seek(Offset: Longint; Origin: Word): Longint; virtual; abstract;

Brainer
09-11-2008, 08:05 AM
I didn't want to start a new thread here, so I posted this here, I hope it's all right.

I decided not to make my own archive format, I want to create my own virtual file system instead. I've been doing some serious thinking recently, reading up on various file systems and I have two questions, I hope you can give me in-depth answers. :)

1.) What's journaling? I read it's used to monitor changes of disk's structure. But how does it really work? This data is stored on HDD or in memory?
2.) Do files on HDD using NTFS get fragmented? How does the deletion work in NTFS?

noeska
09-11-2008, 09:56 AM
Your question on writing an archive format made me curious again on writing an virtual file system again. I made a bare bones test version. But that still lacks a working directory structure. Also while writing i discover i can simplify things. I will post my results so far if you want them. But it is a Work in Progress.

As to your questions:
1) http://en.wikipedia.org/wiki/Journaling_file_system (according to this it is stored on hdd before an action takes place)

2) If you use xp with ntfs you still have to use defrag once in a while. So yes. Only ntfs is supposed to be smarter with realocating space then fat32. Also since there are undelete tools for ntfs also it just throws away the file recod entry and keeps the file intact until the space is used again. No hard evidence on this though. But do read this: http://technet.microsoft.com/en-us/library/cc781134.aspx

PS google is my friend in research :-)

Brainer
09-11-2008, 10:37 AM
I made a bare bones test version. But that still lacks a working directory structure. Also while writing i discover i can simplify things. I will post my results so far if you want them.
Sure, it would be cool to have a look. :)

I have almost everything planned - now I'm busting my brain trying to figure out how to handle file deletion. I'm not sure it can be done without the fragmentation of files concerned. And I want to avoid moving blocks of data to the end of the file and truncating the file size as it's very slow once the archive gets bigger. :S

What'd you suggest me?

noeska
09-11-2008, 11:09 AM
Fragmentation cannot be avoided.

My idea is to keep a list of unused blocks. And reuse them asap. This gets you fragmention in the end. But for the moment i do not care.

Analyzing if you have fragmented files will be easy. Defragmenting it won't be as you said.

During the day i will upload what i have this far. Just wrote some new code on writing data using tvirtualfilestream into a tvirualfilesystem. But i still need to test that. Writing and reading a complete stream at once already works.

Brainer
09-11-2008, 11:26 AM
My idea is to create something very similar to NTFS' LCN and, just like your idea was, store a list of removed blocks so that I know where I can store data. A block would be 1 kB in size.

If I want to store a file, I just calculate how many blocks I'd need, look up the list for removed blocks, seek there and write a 1 kB of data. And so on until the the list of removed blocks is empty, then I'd just write the rest of the file at the end of the archive.

The allocation tree would be placed just after the header and the list of removed blocks. Maybe I'll post some more detailed specification later, I'd really like to have look at your code, noeska, just to make sure I haven't copied much from you. :)

noeska
09-11-2008, 01:51 PM
Here it is: http://www.noeska.net/downloads/VirtualFileSystem.zip MPL license

Be aware the the TVirtualFileStream is broken for writing. And it only supports one file. Because it have not yet written the directory part as i need TVirtualFileStream to work for that. I also need that one to keep the list of deleted blocks (yes blocks not files). So it is not realy useable yet.

In the example try the following
bNew
bFilleMemo
bSaveMemo
Close the application
bOpen
bLoadMemo

everything else may lead to exceptions :-)

Fixed TVirtualFileStream: Here it is: http://www.noeska.net/downloads/VirtualFileSystemFixed.zip MPL license
Not tested fully yet.
Small files e.g. < 1block seem to be broken now

Brainer
11-11-2008, 06:09 PM
So you make the virtual file system have a fixed size?

Also looking at:
TBrainVFSBlock = packed record
Size: Word; // rozmiar bloku danych (domyœlnie: 512)
// data block size (default: 512)
Offset: Int64; // offset bloku w pliku. Int64 to m&#185;dry wyb??r
// block's offset in archive. Int64 is a wide choice here
Flag: Byte; // flaga opisuj&#185;ca blok:
// (0 - wolny blok, 7 - blok u&#191;ywany, 23 - blok usuni&#234;ty)
// flag describing the block:
// (0 - free block, 7 - dirty block, 23 - block deleted)
end;

Makes me wonder why store size and offset? The idea for fat/ntsf or virtualfilesystem is that all blocks are equal size.
In you case you can number your blocks 1,2,3 etc. The offset can be calculated as: header+(blockid*blocksize).
Thus giving:
TBrainVFSBlock = packed record
BlockId: Int64;
Flag: Byte; //could also be boolean?
end;
Or to go haywire. Only put deleted blocks in FBlocks :-) and assume other blocks are used.
Or are you not planning on splitting up larger files over multiple blocks. I do not see code for that yet in your code.
Be aware on what block together make a file and in what order.

Referring to your mail, no. :) Every block has the same size, I just store the info in the record for convenience. :)

About splitting files, yes. My idea is to split large files into several blocks, but I haven't implemented it yet. I sent you the code just to let you know what my idea actually is.

To everyone, here's the code I've got so far:

//************************************************** ****************************
//
// BrainVirtualFileSystem - Wirtualny system plik??w / Virtual file system.
// Copyright &#169; 2008 Patryk Nusbaum.
// Wszystkie prawa zastrze¬řone. / All rights reserved.
//
// Data modyfikacji / Last modified: 10.11.2008
// Mail: patrick.nusbaum@gmail.com
// WWW: www.pateman.net76.net
//
// Licencja / License
// -------------------
// Redistribution and use in source and binary forms, with or without
// modification, are permitted provided that the following conditions are
// met:
//
// * Redistributions of source code must retain the above copyright notice,
// this list of conditions and the following disclaimer.
// * Redistributions in binary form must reproduce the above copyright
// notice, this list of conditions and the following disclaimer in the
// documentation and/or other materials provided with the distribution.
// * Neither the name of Patryk Nusbaum nor the names of its contributors
// may be used to endorse or promote products derived from this software
// without specific prior written permission.
//
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED
// TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR
// PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
// CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
// EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
// PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS;
// OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY,
// WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR
// OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF
// ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
//************************************************** ****************************
unit BrainVirtualFileSystem;
// TODO 5 -o Patryk Nusbaum: Doda?¶ obs¬?ug?™ wyj¬±tk??w. / Add exception handling.
// TODO 4 -o Patryk Nusbaum: Doda?¶ obs¬?ug?™ klasy dziennika. / Add logging support.
// DONE 5 -o Patryk Nusbaum: Opracowa?¶ list?™ przechowuj¬±c¬± wolne bloki. / Make a list storing free blocks.

// prosz?™ si?™ zg¬?osi?¶ do mnie, gdy kto¬? odgadnie znaczenie warto¬?ci
// "BrainVFSProtectionFlag" i flag blok??w. ;)
// hit me up when you guess the meaning of "BrainVFSProtectionFlag"
// and the block flags' values. ;)

interface

{$DEFINE DEBUG}

uses
Windows, Classes, {$IFDEF DEBUG}SysUtils, Dialogs,{$ENDIF}
// --
BrainIntegerList;

const
// sta¬?e
// constants
BrainVFSSignature = 'BRAINVFS';
BrainVFSVersion = 10;
BrainVFSProtectionFlag = 92;
BrainVFSBlockSize = 512;
BrainVFSDiskSize = 1024;
BrainVFSMaxStringLength = 128;

type
// 128-bitowy ¬?a?±cuch tekstu
// a 128-byte string
TBrainVFSString = String[BrainVFSMaxStringLength];

// rekord nag¬???wka pliku archiwum
// archive file's header record
{ .: TBrainVFSHeader :. }
TBrainVFSHeader = packed record
VFSSignature: array[0..7] of Char; // sygnatura pliku (domy¬?lnie: "BRAINVFS")
// file signature (default: "BRAINVFS")
VFSVersion: Byte; // wersja pliku (domy¬?lnie "10", czyli 1.0)
// file version (default: "10", means 1.0)
VFSProtectionFlag: Byte; // flaga ochrony pliku:
// (0 - brak ochrony, 92 - ochrona w¬?¬±czona)
// file protection flag:
// (0 - no protection, 92 - protection on)
VFSTreeOffset: Int64; // offset drzewa alokacji plik??w
// offset of the file allocation tree
end;

// rekord reprezentuj±cy pojedynczy blok danych
// record representing a single data block
{ .: TBrainVFSBlock :. }
TBrainVFSBlock = packed record
Size: Word; // rozmiar bloku danych (domy¬?lnie: 512)
// data block size (default: 512)
Offset: Int64; // offset bloku w pliku. Int64 to m±dry wyb??r
// block's offset in archive. Int64 is a wide choice here
Flag: Byte; // flaga opisuj±ca blok:
// (0 - wolny blok, 7 - blok u¬řywany, 23 - blok usuni?™ty)
// flag describing the block:
// (0 - free block, 7 - dirty block, 23 - block deleted)
end;

// klasa obs¬?uguj¬±ca dzia¬?anie wirtualnego dysku
// class working as a virtual disk
{ .: TBrainVFSDisk :. }
TBrainVFSDisk = class sealed(TObject)
private
{ Private declarations }
FBlocks: array of TBrainVFSBlock;
FFreeBlocks: TBrainIntegerList;
// --
FBlockCount: Word;
FBlockSize: Word;
FDiskSize: Word;
FStream: TStream;
// funkcja sprawdza, czy podany blok istnieje
// this function checks if "ABlockIndex" is a valid block index
function IsValidBlock(const ABlockIndex: Word): Boolean;
public
{ Public declarations }
constructor Create(const S: TStream = nil; const ABlockSize: Word = BrainVFSBlockSize;
const ADiskSize: Word = BrainVFSDiskSize);
destructor Destroy(); override;

// procedura dokonuje formatowania dysku
// this procedure formats a disk
procedure FormatDisk();

// funkcja zwraca pierwszy wolny blok danych
// this function returns a free block's index
function GetFreeBlock(): Word;
// funkcja zwraca "True", gdy flaga bloku wynosi "0"
// this function returns "True" when a block's flag is "0"
function IsBlockEmpty(const ABlockIndex: Word): Boolean;
// funkcja zwraca "True", gdy flaga bloku wynosi "23"
// this function returns "True" when a block's flag is "23"
function IsBlockRemoved(const ABlockIndex: Word): Boolean;
// funkcja zwraca "True", gdy flaga bloku wynosi "7"
// this function returns "True" when a block's flag is "7"
function IsBlockDirty(const ABlockIndex: Word): Boolean;
// funkcja zwraca offset bloku
// this function returns a block's offset
function GetBlockOffset(const ABlockIndex: Word): Int64;

// procedura odczytuje zawarto¬??¶ danego bloku do strumienia
// this procedure reads block's contents to a stream
procedure ReadBlock(const ABlockIndex: Word; out S: TStream);
// procedura zapisuje zawarto¬??¶ strumienia do bloku
// this procedure saves stream's contents to a block
procedure WriteBlock(const ABlockIndex: Word; const S: TStream);
// procedura ustawia flag?™ bloku na "23"
// this procedure sets a block's flag to "23"
procedure MarkBlockRemoved(const ABlockIndex: Word);
// procedura ustawia flag?™ bloku na "7"
// this procedure sets a block's flag to "7"
procedure MarkBlockDirty(const ABlockIndex: Word);

{$IFDEF DEBUG}
procedure DisplayFreeBlocks();
{$ENDIF}

property BlockCount: Word read FBlockCount;
property DiskSize: Word read FDiskSize;
property BlockSize: Word read FBlockSize;
end;

// klasa imituj¬±ca dzia¬?anie systemu dyskowego
// class behaving like a disk system
{ .: TBrainVFSDiskSystem :. }
TBrainVFSDiskSystem = class sealed(TObject)
private
{ Private declarations }
FDisk: TBrainVFSDisk;
public
{ Public declarations }
constructor Create(const ADisk: TBrainVFSDisk = nil);
destructor Destroy(); override;

(*
// procedura zapisuje plik do naszego wirtualnego dysku
// this procedure stores a file inside our virtual disk
procedure WriteFile(const AFileName: String);
*)

property Disk: TBrainVFSDisk read FDisk;
end;

implementation

{ TBrainVFSDisk }

constructor TBrainVFSDisk.Create(const S: TStream; const ABlockSize, ADiskSize: Word);
begin
inherited Create();

if (ABlockSize = 0) then
FBlockSize := BrainVFSBlockSize
else
FBlockSize := ABlockSize;
if (ADiskSize = 0) then
FDiskSize := BrainVFSDiskSize
else
FDiskSize := ADiskSize;

FStream := S;
FBlockCount := (FDiskSize div FBlockSize);

FFreeBlocks := TBrainIntegerList.Create();
FFreeBlocks.Duplicates := dupIgnore;
FFreeBlocks.Sorted := True;

SetLength(FBlocks, FBlockCount);
FormatDisk();
end;

destructor TBrainVFSDisk.Destroy;
begin
FStream := nil;
FBlocks := nil;
FFreeBlocks.Free();

inherited Destroy();
end;

procedure TBrainVFSDisk.DisplayFreeBlocks;
var
S: String;
I: Integer;
begin
S := 'FREE BLOCKS' + #13#10;
for I := 0 to FFreeBlocks.Count -1 do
S := S + IntToStr(FFreeBlocks.Integers[I]) + #13#10;
ShowMessage(S);
end;

procedure TBrainVFSDisk.FormatDisk();
const
AByte: Byte = 0;
var
I, J: Integer;
begin
FFreeBlocks.Clear();
FStream.Seek(SizeOf(TBrainVFSHeader), soFromBeginning);

for I := 0 to FBlockCount -1 do
with FBlocks[I] do
begin
Size := FBlockSize;
Offset := SizeOf(TBrainVFSHeader) + I * SizeOf(TBrainVFSBlock);
Flag := 0; // blok wolny / free block

FFreeBlocks.Add(I);

for J := 0 to FBlockSize -1 do
FStream.Write(AByte, SizeOf(Byte));
end;
end;

function TBrainVFSDisk.GetBlockOffset(const ABlockIndex: Word): Int64;
begin
if IsValidBlock(ABlockIndex) then
Result := FBlocks[ABlockIndex].Offset
else
Result := -1;
end;

function TBrainVFSDisk.GetFreeBlock: Word;
begin
Result := FFreeBlocks.First();
end;

function TBrainVFSDisk.IsBlockDirty(const ABlockIndex: Word): Boolean;
begin
if IsValidBlock(ABlockIndex) then
Result := (FBlocks[ABlockIndex].Flag = 7)
else
Result := False;
end;

function TBrainVFSDisk.IsBlockEmpty(const ABlockIndex: Word): Boolean;
begin
if IsValidBlock(ABlockIndex) then
Result := (FBlocks[ABlockIndex].Flag = 0)
else
Result := False;
end;

function TBrainVFSDisk.IsBlockRemoved(const ABlockIndex: Word): Boolean;
begin
if IsValidBlock(ABlockIndex) then
Result := (FBlocks[ABlockIndex].Flag = 23)
else
Result := False;
end;

function TBrainVFSDisk.IsValidBlock(const ABlockIndex: Word): Boolean;
begin
Result := (ABlockIndex < FDiskSize);
end;

procedure TBrainVFSDisk.MarkBlockDirty(const ABlockIndex: Word);
begin
if IsValidBlock(ABlockIndex) then
begin
FBlocks[ABlockIndex].Flag := 7;
FFreeBlocks.Remove(ABlockIndex);
end;
end;

procedure TBrainVFSDisk.MarkBlockRemoved(const ABlockIndex: Word);
begin
if IsValidBlock(ABlockIndex) then
begin
FBlocks[ABlockIndex].Flag := 23;
FFreeBlocks.Add(ABlockIndex);
end;
end;

procedure TBrainVFSDisk.ReadBlock(const ABlockIndex: Word; out S: TStream);
begin
if IsValidBlock(ABlockIndex) and Assigned(S) then
begin
FStream.Seek(FBlocks[ABlockIndex].Offset, soFromBeginning);
S.CopyFrom(FStream, FBlockSize);
end;
end;

procedure TBrainVFSDisk.WriteBlock(const ABlockIndex: Word; const S: TStream);
begin
if IsValidBlock(ABlockIndex) and Assigned(S) then
begin
FStream.Seek(FBlocks[ABlockIndex].Offset, soFromBeginning);
FStream.CopyFrom(S, FBlockSize);

// to samo, co MarkBlockDirty
// the same as MarkBlockDirty
FBlocks[ABlockIndex].Flag := 7;
FFreeBlocks.Remove(ABlockIndex);
end;
end;

{ TBrainVFSDiskSystem }

constructor TBrainVFSDiskSystem.Create(const ADisk: TBrainVFSDisk);
begin
inherited Create();

FDisk := ADisk;
end;

destructor TBrainVFSDiskSystem.Destroy;
begin
FDisk := nil;

inherited Destroy();
end;

end.


I'll keep you posted. :)

noeska
11-11-2008, 06:41 PM
for now i have an interesting problem.

start situation:
directory is at block 1.

next i save a file so as nextfreeblock i get 2.
next i write an entry in the directory (file is not yet written)
unfortunately the dir entry is to large to fit into block 1 together with the previous entry so it request a nextfreeblock to use and that gets 2 also
now i write the file now on writing it i do another nextfreeblock and get 3.
now i have wrong directory entry for the file as that now states 2 and not 3 as it should be.

hmm how to solve that:
1) update the directory entry after writing the file
2) write the file entry only after writing the file

Download: http://www.noeska.net/download/VirtualFileSystemDirTest.zip MPL license

Brainer
11-11-2008, 06:44 PM
Ooops, now I realised I made I mistake putting all the code here, next time I'll provide a link. :)

Noeska, do you have to store the file structure as a block? Can't you just use XML or a B-tree and just store it in the file archive?

noeska
11-11-2008, 06:50 PM
Noeska, do you have to store the file structure as a block? Can't you just use XML or a B-tree and just store it in the file archive?


What file stucture are you refering to? If you mean a directory entry then yes. See what i do with fdir. Essentialy that is a tfilestream inside a tfilestream (the virtual filesystem).

Brainer
11-11-2008, 06:56 PM
Yip, I meant the directory structure. :) My plan is to store it as a tree, at the end of the file. Also, I'm not sure if you noticed, I set a limit to the maximum size of a disk. It's 4 gigabytes (65536 bytes/block * 65536 blocks). 4 gigs is enough for now, I don't plan on making such big archives yet, but who knows? :)

I haven't checked your demo yet, will do soon. :) Why do you use a TStream-descendant class, btw? I saw it multiple times in other projects, couldn't understand it at all. :S

noeska
11-11-2008, 07:07 PM
My plan is to store it as a tree, at the end of the file. Also, I'm not sure if you noticed, I set a limit to the maximum size of a disk.
Is not that a waste of disk space.? E.g. if you plan on using 2gb at the max and store the tree at the end you make a 2gb empty file first?



Why do you use a TStream-descendant class, btw? I saw it multiple times in other projects, couldn't understand it at all. :S
Yes it is not easy and i encountered lots of bugs while writing it :-)
But in general streams are an easy to use. E.g. lots of components in delphi provide easy ways of using them. E.g. Memo1.Lines.SaveToStream(MyStream);

Brainer
11-11-2008, 07:12 PM
Is not that a waste of disk space.? E.g. if you plan on using 2gb at the max and store the tree at the end you make a 2gb empty file first?

No, you can always choose the size of the disk. All the blocks are written at once, that's the fault of my idea. So if you create a 2 GB disk, all the blocks will be immediately written. I have to find a way round it...

EDIT
Actually, I've found a way. :) Will introduce it later once I make some progress. Stay tuned. :lol:

noeska
16-11-2008, 03:39 PM
after lots of bugfixing a better working version. The TFileStream withing TFileStream now works. Phew :-)
Now i better go clean up the source as that is a real mess right now. :-(

Anyway here is the download: http://www.noeska.net/downloads/VirtualFileSystemAlpha4.zip (mpl license)

i hope this download works as i seem to have mistyped the previous ones

noeska
23-11-2008, 07:15 PM
Another version: http://www.noeska.net/downloads/VirtualFileSystemAlpha5.zip

So for this release i could use some alpha testers...

bug 1 is already there: the xth directory entry is written in the wrong block at the wrong position? Why?

Bug fixed so: http://www.noeska.net/downloads/VirtualFileSystemAlpha6.zip

noeska
26-11-2008, 06:20 PM
Now it is alpha 7 time already. Not it is also possible to delete files. Deleted blocks are reused again.
This only works for tvirtualfilestream bases access to the tvirtualfilesystem. So writedata (and readdata) need to be rewritten to use a tvirtualfilestream. Besides making reusing deleted blocks work with those it save some double code.
With reusing deleted block the tvirtualfilesystem may end up to be defragmented. Hmm how to write such an thing?

Who are willing to give this alpha version a test run? It comes with an basic example i use for testing.
I still have to give the virtualfilesystem a proper name and i am out of ideas so suggestions are welcome.

Download: http://www.noeska.net/downloads/VirtualFileSystemAlpha7.zip

arthurprs
27-11-2008, 04:27 PM
noeska, why you implemented it using blocks, any special reason?

noeska
27-11-2008, 05:14 PM
the alternative would be complete files, but that would lead to disaster with deleting and reusing filespace.

With blocks it is relative easy deleting and reusing filespace. Also a file can grow in size without the need of moving it completely. Addressing is also relative simple because of the static size of a block within a virtualfile system. With complete files you dont have that advantage.

The only drawback is waste of space with file smaller then the blocksize and defragmentation.

E.g. also real filesystems like fat en ntfs use blocks.

But if you have another alternative besides using block do let me know.

Brainer
27-11-2008, 06:04 PM
But if you have another alternative besides using block do let me know.
Yip, I'd like to hear about such, too. :)

arthurprs
27-11-2008, 08:49 PM
I usually put a header with some signature and filecount, then some description structures * filecount and finally the files =s

it works fine to pack game files :?

noeska
27-11-2008, 09:42 PM
yes, but i want to do more then pack game files. This is way more dynamic. Allowing to change files within the virtual file system without having to recreate the data file. E.g. it should be possible to use it as a datastorage for a database.

arthurprs
27-11-2008, 10:43 PM
yes, but i want to do more then pack game files. This is way more dynamic. Allowing to change files within the virtual file system without having to recreate the data file. E.g. it should be possible to use it as a datastorage for a database.

uhm, interesting.

Brainer
28-11-2008, 04:20 PM
I've got the same aim as noeska, I'm not interested in pack files, I want to do more than that. :)

Anyway, I'm not sure there's any other way to store files than just diving them into blocks. Of course you can write them at once, but then removing these from package would mean copying a file to the end of the package file and then truncating its size by the size of the removed file. And this is well-known to be a slow operation, especially when talking about "bigger" files (>1 GB).

And AFAIK, fragmentation cannot be avoided. I just can't think of any other way of storing files inside a package.