Results 1 to 8 of 8

Thread: Using a file as raw memory

  1. #1

    Using a file as raw memory

    Ok, I know I can do this, and I know I've done it, but I'll be darned if I can remember how to do it. Basically, I'm building index files for a very large set of raw data, large enough it won't fit into memory very well. So, instead, I'd like to build it out to a file and perform all of my calculations, reads, and writes to disk directly. As eventually this will be running on a SMD SAN data access speeds and read/write block speeds won't be a concern.

    Now, I know that I can setup the memory manager for the application to utilize a TFileStream for its allocator and deallocator, but this is a bit overkill. In this case I simply need to have my root records and pointers only go to the file system.

    Hopefully this makes sense to someone, but if not ask questions and I'll answer what I can. Due to the nature of the project I can't post source yet. I'm working out a simple version of the problem that isn't related so I can post something to work from.

    - Jeremy

  2. #2

    Re: Using a file as raw memory

    Have you ever heard of B+Trees? Maybe you can use them to prepare an index for your file system, as I'm guessing you're writing one.

  3. #3

    Re: Using a file as raw memory

    Yep, what I'm implementing is similar to a B+ trie, but is actually a redundant and recursive DAWG giving absolute indexing of all data sources. Since I'm indexing XML the relations are File <-> Node <-> Value <-> File based. The problem isn't the structures or the schema (its fast, real fast), its the actual disk utilization LOL

    Guess I could change all of the existing packed record structures to objects and tie them to a Stream, but that seems a bit of a mess. I'd rather just read headers and utilize offsets.

    Then again, for another project I'd love to find a B+ implementation written in Pascal

    - Jeremy

  4. #4

    Re: Using a file as raw memory

    So actually, what exactly do you need help with?

  5. #5

    Re: Using a file as raw memory

    I need an optimized file read/write schema that will allow me to read and write my structures in blocks to/from the drive itself. I'll just have to get a sample put together and see if anyone has any good ideas .

    The basic idea is to get rid of the swapping that occurs now do to the massive index size (quickly out runs the 64GB of ram when dealing with terabyte's of data). Since I'm already swapping to disk it makes more sense to utilize the drive directly.

    IE:
    OpenIndexFile;
    LocateRootRecord;
    TraverseTheRabbitHole;
    RetrieveRecord;
    CloseIndexFile;

    For an idea of the record structure you could look at my article on Trie's in the 1st Pascal Gamer mag. It has the basic concept, just disk bound instead of memory bound.

    - Jeremy

  6. #6

    Re: Using a file as raw memory

    Would memory mapped files be what you are looking for?

    cheers,
    Paul

  7. #7

    Re: Using a file as raw memory

    Don't know, never used them . Will have to do some research and see if that will achieve what I'm looking for. Now, I'm busy tracking down a small bug I noticed last night that could cause unnecessary node duplication in my DAWG.

  8. #8

    Re: Using a file as raw memory

    Quote Originally Posted by paul_nicholls
    Would memory mapped files be what you are looking for?
    Memory-mapped files are good solution when we're not talking about big files, otherwise they're quite useless, if you ask me.

    I think the solution would be B+Trees. I couldn't find any decent B+Tree implementation, but I've found a good one written in C++, you might find it useful: http://www.sendspace.com/file/96vjoa

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •