My own archive format

**Robert Kosek** · 23-10-2008, 01:54 PM

I'll check your format out shortly, I was just writing this for some general advice plus some code examples.

You should know where to put the archive is always at the end, so you needn't tally a thing. This should be programmatic, rather than arithmetic.

Archive creation:

Code:

  //Write the header at position 0
  //Write the compressed files to the stream
  Header.FileListOffset = Stream.Position;
  //Write the array of records to the stream, the file list
  //Refresh the header with the actual file list offset
  Stream.Seek&#40;0, soFromBeginning&#41;;
  Stream.Write&#40;Header, sizeof&#40;header&#41;&#41;;
  Stream.Free;
end;

Archive update:

Code:

  Stream.Position &#58;= Header.FileListOffset;
  //Add new compressed files&#58; this overwrites the header
  Header.FileListOffset = Stream.Position;
  //Write the array of records to the stream, the file list
  //Refresh the header with the actual file list offset
  Stream.Seek&#40;0, soFromBeginning&#41;;
  Stream.Write&#40;Header, sizeof&#40;header&#41;&#41;;
  Stream.Free;
end

My archive file records always have these elements:

Code:

  FileRecord = record
    // It could be anywhere, and an int64 is just a wise choice here.
    start&#58; Int64;
    // Compressed file size &#40;because you can read until there's no input,
    // you don't need an uncompressed length field.&#41;
    len&#58; Longword;
    // And this says whether I used BZip or ZLib, which you can omit.
    bzipped&#58; boolean;
  end;

To save an array of these to the stream is quite simple:

Code:

Stream.Write&#40;files&#91;0&#93;, Count * SizeOf&#40;FileRecord&#41;&#41;;

To read:

Code:

  SetLength&#40;files,fCount&#41;;  // store the file count in the header!
  Stream.Read&#40;files&#91;0&#93;,SizeOf&#40;FileRecord&#41;*fCount&#41;;

People like to say this saves the metadata for a dynamic array, but I disagree. The main reason I disagree is that I factor the size of only the elements of the array, and not its meta data; because I write from element 0 to the length of the elements, it acts like an extraction of the elements from the metadata. At least, so far as I know.

If you want a short string, 255 characters worth, as your ID then just add "id: shortstring" to it. Pascal can easily handle this data's transference to the stream. Now, if you want longer file names ... you must use a stringlist and treat both the list plus your array of records as an indexed list, and only change positions of records and names together. (You can even compress the file list for better compression ratios.)

Directories, in my format, are just a part of the filename within the archive. Thus if I want "my dir/my file.txt" I tell it to extract just that. Why worry about virtual folders? That's more trouble than it is worth. If you want more fancy folder enumeration stuff, you could build that into your file list class.

**noeska** · 23-10-2008, 04:51 PM

ugh gnu gpl :-( i dont like it, i prefer mpl.
but its your choice to use it.

Also does not that filetypes part complicate things? It is not that an mp3 file is written differently then a jpeg?

I see you have not gotten to implementing (virtual)folders?

What worries me is the delete routine. Does it move al files after the deleted file over the deleted ones position? Could get slow if the file becomes larger. Also if the deleted file/one is 10 bytes and the next is 100 bytes you end up reading and writing within the next 100bytes file. Should work, but i dont like the idea of it.

yet another method:
if you want to make it really dynamic you should implement something like a fat file structure. E.g. make the stream consist of 32bytes blocks with pointer to previous and next block. So a file can end up taking up blocks 1,2,3 and 4 and another file can take up blocks 5.6 and 7. even the file list and folder structure could take up block 8. on adding a file that one could take up 9,10 if block 8 becomes to small to hold filelist and folder it could extend to block 11. etc. This means you can wast some space as not all blocks are competely filled. Blocks 4,7,10 could contain less data then 32bytes. Also on deleting the second file block 5,6,7 can be marked as deleted. So on a next file insert these block can be used again. Yes and before you know it you need to write a defrag tool :-)

**Brainer** · 23-10-2008, 04:55 PM

noeska I actually don't care about the licence, you can do anything you like with it.

You see, it was my first attempt at archive formats, so I guess it could have errors and many things could have been done better.

Regarding deletion, what's the best option to do it?

**noeska** · 23-10-2008, 05:03 PM

there is no best way to do deletion :-) That just depends on the usage of the archive file. Also note my fat addition to my previous post.

**Brainer** · 23-10-2008, 05:17 PM

Man, I don't want to implement a VFS, just an archive format. xD
But you know, once I have my archive created, I wouldn't need to delete files. Imagine a game that deletes its own data.

**Brainer** · 25-10-2008, 10:16 PM

Another question. How should I load file data? I guess it shouldn't be loaded at once into memory, because an archive could be 1 GB. Any suggestions? :?

**noeska** · 26-10-2008, 10:25 AM

Use tfilestream?

Have a look at this for the basics: http://dn.codegear.com/article/26416

So .position if your friend here :-)

together with

function Seek(Offset: Longint; Origin: Word): Longint; virtual; abstract;

**Brainer** · 09-11-2008, 08:05 AM

I didn't want to start a new thread here, so I posted this here, I hope it's all right.

I decided not to make my own archive format, I want to create my own virtual file system instead. I've been doing some serious thinking recently, reading up on various file systems and I have two questions, I hope you can give me in-depth answers.

1.) What's journaling? I read it's used to monitor changes of disk's structure. But how does it really work? This data is stored on HDD or in memory?
2.) Do files on HDD using NTFS get fragmented? How does the deletion work in NTFS?

**noeska** · 09-11-2008, 09:56 AM

Your question on writing an archive format made me curious again on writing an virtual file system again. I made a bare bones test version. But that still lacks a working directory structure. Also while writing i discover i can simplify things. I will post my results so far if you want them. But it is a Work in Progress.

As to your questions:
1) http://en.wikipedia.org/wiki/Journaling_file_system (according to this it is stored on hdd before an action takes place)

2) If you use xp with ntfs you still have to use defrag once in a while. So yes. Only ntfs is supposed to be smarter with realocating space then fat32. Also since there are undelete tools for ntfs also it just throws away the file recod entry and keeps the file intact until the space is used again. No hard evidence on this though. But do read this: http://technet.microsoft.com/en-us/l.../cc781134.aspx

PS google is my friend in research :-)

**Brainer** · 09-11-2008, 10:37 AM

Originally Posted by noeska

I made a bare bones test version. But that still lacks a working directory structure. Also while writing i discover i can simplify things. I will post my results so far if you want them.

Sure, it would be cool to have a look.

I have almost everything planned - now I'm busting my brain trying to figure out how to handle file deletion. I'm not sure it can be done without the fragmentation of files concerned. And I want to avoid moving blocks of data to the end of the file and truncating the file size as it's very slow once the archive gets bigger.

What'd you suggest me?

Moderation Process Reminder

Thread: My own archive format

Thread Tools

Display