Page 1 of 4 123 ... LastLast
Results 1 to 10 of 40

Thread: My own archive format

  1. #1

    My own archive format

    Hello there.

    I've been struggling with this for a week and I haven't worked it out yet. What I mean is my own archive file format. It's not that hard if you just want to store files without directories - I've done it once before. Now I'd like to extend its possibilities and add directories support.

    My question is - what's the best way of implementing them? I mean, I can easily make appropiate structures in Pascal and save them to a stream as a tree. But I have no idea how to properly calculate offsets. Any ideas?

    Or maybe you can put me on the right track and propose me another solution?

    I'd appreciate any ideas, replies.

  2. #2

    My own archive format

    I am using degisy tdatafile: http://www.torry.net/quicksearchd.ph...file&Title=Yes

    Using sections you have a flat directory structure. (and you can always cheat using / in the section name :-) )
    I modified it a bit to support lha compression also.

    The only problem is that when deleting a file the archive size does not become smaller. Only copying all files to a new archive helps for that.

    Currently i am thinking of moving to sqlite for this purpose. Yes using a database for file storage :-)
    http://3das.noeska.com - create adventure games without programming

  3. #3

    My own archive format

    I thought of making it on my own, it's like a challenge. But thanks for the component, I'll take a look at it.

    Any idea how it's done by WinRAR? I also thought of making an XML file and putting it inside the archive. It's easy to maintain the structure, but it's still a pain to calculate the offsets.

  4. #4

    My own archive format

    thats how i do

    ps: it will be nice to sort the order of the files by their folders, but it's not necessary

    generate the xml
    -> start the offset at 0 and sum it with the size of previous file entry

    file structure:
    -> array of char[0..3] with your header
    -> uint32 with the size of the xml
    -> the xml with offsets
    -> store the files

    -----------------------------

    when opening, read the header
    if header is not good abort

    read the size of the structure xml, then read and parse the xml

    for acessing a file, sum ( 8 + xmlsize ) to the offset described on the xml,

    or seek from the end of xml file with the offset described on the xml


    there are other possible structures, like storing the xml on the end of the file etc..
    From brazil (:

    Pascal pownz!

  5. #5

    My own archive format

    Yip, that's what I've been thinking of, too, but I don't quite get the idea of calculating the offsets. And what do you mean by:
    start the offset at 0 and sum it with the size of previous file entry
    Does that mean I should generate an XML for each file? Am I getting it wrong?

  6. #6

    My own archive format

    Quote Originally Posted by Brainer
    Yip, that's what I've been thinking of, too, but I don't quite get the idea of calculating the offsets. And what do you mean by:
    start the offset at 0 and sum it with the size of previous file entry
    Does that mean I should generate an XML for each file? Am I getting it wrong?
    no, something like
    Code:
    offset = 0
    
    for entry in files:
       wirtetoxml(entry.name,entry.size,offset);
       offset += entry.size
    From brazil (:

    Pascal pownz!

  7. #7

    My own archive format

    I wrote my own archive format, so let me give you some advice. If you are in need of any specific pointers, just ask away and I'll chip in some more advice.

    The Basics

    You need:
    • A header record.
    • File list records.
    • A way to save file names easily.


    So, you must decide if this is the kind of archive you regenerate every time you add files, and therefore change internal orders every time, or if you need to be able to add new files. If you regenerate it every time you can sort the records and file names, and so use a binary search to find the items.

    How to Save Directories

    Well, the obvious way is to use a recursive search. If you do this, then you know the source directory and can easily transform the absolute filenames you get into relative ones. Obviously this doesn't include how you store these directories internally. Frankly, the easiest way is to make the filename "my dir\file.txt" within the data structure, but it isn't as elegant as something like WinRAR or WinZIP.

    Here's a fast recursive search I wrote as an example:
    Code:
    procedure TIAWriter.AddFiles(const directory: string; Options: TAddFilesOptions
     = [afoRecurse, afoIgnoreHidden]);
    var
      SearchRec: TSearchRec;
      Dir: string;
    
      procedure SearchSubDir(const sub: string);
      var SearchRec2: TSearchRec;
          temp: string;
      begin
        if FindFirst(dir+sub+'\*.*',faAnyFile,SearchRec2) = 0 then
          repeat
            if (afoIgnoreHidden in Options) and (SearchRec2.Attr and faHidden > 0) then
              Continue;
            temp := IncludeTrailingPathDelimiter(sub) + SearchRec2.Name;
            if &#40;SearchRec2.Name <> '.'&#41; and &#40;SearchRec2.Name <> '..'&#41; then
              if &#40;SearchRec2.Attr and faDirectory = 0&#41; then
                AddFile&#40;dir + temp, temp, afoEncrypt in Options&#41;
              else
                SearchSubDir&#40;temp&#41;;
          until FindNext&#40;SearchRec2&#41; <> 0;
      end;
    
    begin
      fHeader.arcEncrypted &#58;= afoEncrypt in Options;
      dir &#58;= IncludeTrailingPathDelimiter&#40;directory&#41;;
      if FindFirst&#40;dir+'*.*',faAnyFile,SearchRec&#41; = 0 then
        repeat
          if &#40;afoIgnoreHidden in Options&#41; and &#40;SearchRec.Attr and faHidden > 0&#41; then
            Continue;
          if &#40;SearchRec.Name <> '.'&#41; and &#40;SearchRec.Name <> '..'&#41; then
            if &#40;SearchRec.Attr and faDirectory = 0&#41; then
              AddFile&#40;dir + SearchRec.Name,SearchRec.Name,afoEncrypt in Options&#41;
            else if afoRecurse in Options then
              SearchSubDir&#40;SearchRec.Name&#41;;
        until FindNext&#40;SearchRec&#41; <> 0;
      FindClose&#40;SearchRec&#41;;
    end;
    Cheating in the file records and offsets...

    This is really easy, and I'm glad I figured it out early when writing my format.

    The trick is to structure the archive like so:
    Code:
    HEADER
    ----------------
    FILES
    ----------------
    FILE RECORDS
    The second trick is to assemble the archive procedurally. Create your stream, write a blank header, and then compress each file and write it to the stream. Of course you should be making the file list the whole time, and storing the compressed length plus the file offset in the stream (before you write the file to the stream). Then when you're done writing the files write the position to the header's file list offset. Write the file list, seek to the beginning and write your real header ... and close.

    It's quite simple really. If it's hard to follow I'll give you a numbered list.

    Misc. Tips

    Don't:
    • Make the archive solid, because all seek orders you do will require the archive to be decompressed every time. All additional writes will take slightly more time each time as the archive becomes large.
    • Be indecisive about your requirements, they are excruciatingly difficult to factor in to your code if they involve a major methodology change; sometimes a full rewrite if you're sloppy.


    Edit: Forgot about the code mangling if HTML wasn't disabled ... so I fixed that and disabled it.

  8. #8

    My own archive format

    Robert, I had an identical structure on my mind like yours:
    Code:
    HEADER
    ----------------
    FILES
    ----------------
    FILE RECORDS
    But how to store directories using this structure and what about the offsets?

  9. #9

    My own archive format

    You should know the size of the element/file you are going to add to the archive. Using that you can calculate where the file is inside the archive.

    e.g. if the header size is 10 bytes. file1 is 100 bytes and file3 is 120 bytes you can add a file at 10+100+120 or put the list of files with sizes there.

    Also if you want to read the second file you know it is at 10+100.

    Also in the header you should store the total size of files so you know where the file list is (FILE_RECORD).

    folders/directories are easy as they are not existing files. Enumerate them and you can refer them in the file record.

    example of possible file record (id, name, size, folderid)
    1 file1 100 3
    2 file2 120 3
    3 folder1 0
    4 folder2 3
    http://3das.noeska.com - create adventure games without programming

  10. #10

    My own archive format

    Thanks noeska, it dissipated a few doubts.

    I'll give my best shot at it and post the results so that you can test it. If anyone is interested, the first version of my archive format can be downloaded from there: http://uploadingit.com/files/738520_...9072008%5D.zip.

    I'd like to hear some feedback.

Page 1 of 4 123 ... LastLast

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •