Buffer Memory Management

**igmac** · 01-04-2012, 11:09 AM

In a library I'm starting on I have the requirement for handling fairly large and frequent passing of buffers from the program to the library. The quesion on methodology is this: Would it not be more efficient for the program to pass ownership of the buffer to the library rather than use the standard method of copying every single buffer? All this dual buffer stuff seems frightfully wasteful to me.

To illustrate, consider a network library where you pump about 10 MB/sec (bytes, not bits) through the network. ie, roughly what you would get with a normal 100 mbps network card. The normal method of a non-blocking call is that you populate your buffer with data, call the library which creates it's own buffer, and then copies your buffer into it's buffer, and then returns back to you where you can now reuse or discard your buffer.

This is very simple, very easy, and very encapsulated. But it is not efficient.

The alternative is that the program calls the library and passes ownership of the existing buffer to the library, who is then free to use it and then to discard it when completed. Of course the caller must not touch the buffer after calling the library, so possibly setting buffer := nil; after the call is a good idea.

This is not without drawbacks. The memory management on the library and the program have to be the same one. Not an issue for an embedded library where you pull the library's source into your program, but a factor when you use it as a linked library. Secondly, you can't pool your buffers so there is an overhead of buffer creation and destruction. Though not critical, this can increase memory defragmentation.

Your thoughts?

LP · 01-04-2012, 01:34 PM

I think you shouldn't bother with this until you have your actual application working and you realize that this is an issue.

Otherwise, what you are suggesting falls into Premature Optimization; I would recommend following YAGNI principle here and work on this buffering topic when it becomes an issue. There's an idiom involved: you'll cross that bridge when you come to it.

In Delphi, few years back they have integrated fast memory manager which should suffice for even more demanding server/client applications. In any case, I don't think you'll have to worry about memory fragmentation and buffer usage.

**igmac** · 02-04-2012, 10:31 AM

Well, I am one of those who believe that the slowest way to write a program is to start coding immediately. ie, I believe in design before coding.

I might be sensitive to the task though, because I have recently done a bit of work on very high performance networking where using zero-copy with scatter/gather made phenomenal differences in an iSCSI setup. In essence I am taking about zero-copy.

On the other hand, simply designing the code with the thought in mind that I may later wish to implement zero-copy, may be sufficient. It's much easier to modify or add something in if you have designed around that possibility in the beginning.

**User137** · 02-04-2012, 11:30 AM

Originally Posted by igmac

Would it not be more efficient for the program to pass ownership of the buffer to the library rather than use the standard method of copying every single buffer? All this dual buffer stuff seems frightfully wasteful to me.

When it comes to network library, wouldn't it only need a pointer to data that you have in your application? Copying it over to librarys variable space is time consuming, but propably much much less time than it takes for it to move over network.

Much of the work regarding TCP buffers is done by the operating system. You don't necessarily need to do an extra layer yourself to encapsulate packets. TCP protocol guarantees that everything arrives in its destination in same order as you send, without loss.

If it works, it's ok. The less the data is moved around the better.

**igmac** · 02-04-2012, 11:55 AM

You are right, just a pointer. However a normal system will, in the call, copy the data into it's own buffer. Here's the typical sequence:

Code:

.----------------.
| Program Buffer |
'----------------'
         |
         |    .----------------.
         '--->| Library Buffer |
              '----------------'
                       |
Kernel Mode ===========|=======================
                       |
                       v
               .---------------.
               | Kernel Buffer |
               '---------------'
                       |
                       |      .---------------.
                       '----->| Device Buffer |
                              '---------------'

If your driver uses zero-copy with scatter/gather, the kernel -> device buffer copy can be skipped.

As you can see, that's a lot of RAM copies. And they are CPU RAM copies, not DMA, so it's very wasteful.

The library would receive a pointer to the data, and a length. It would then copy that data over into it's buffer(s) before calling the kernel, again passing a pointer to the data, and the length, and the kernel will again copy it into it's buffer(s) before returning from the call.

The whole reason this is done is so that the caller can then consider it's buffer 'handled' and can overwrite it with new data for the next packet. If it didn't do this, it would need a complicated callback mechanism where it would be told when the buffer was really free. That's just not feasible. Hence the idea of passing ownership of the buffer back and forth, where the only data being copied is the pointer, not the block of data itself.

**igmac** · 02-04-2012, 11:56 AM

Oh, and of course it takes the same path back for received packets, just in reverse order.

LP · 02-04-2012, 01:26 PM

Originally Posted by igmac

Well, I am one of those who believe that the slowest way to write a program is to start coding immediately. ie, I believe in design before coding.

No, you seem to incline on low-level performance optimizations even before you have started the project. You haven't stated a word about the design, just some bits of (bandwidth) requirements.

Originally Posted by igmac

I might be sensitive to the task though, because I have recently done a bit of work on very high performance networking where using zero-copy with scatter/gather made phenomenal differences in an iSCSI setup. In essence I am taking about zero-copy.

In original post you have mentioned 10 Mb/sec bandwidth. For this kind of bandwidth using simple Pascal arrays and passing parameters by reference will suffice.

**igmac** · 02-04-2012, 02:28 PM

Yes indeed, I am definitely talking a very single, very specific, low level performance optimisation here. Hence the post in 'tools and methodologies' rather than more specifically related to any individual project.

Having now gone and looked at the Linux kernel implementation since I first posted, I am more certain that in very high bandwidth - for example, but by no means an exclusive example, 10Gbit network - this is the way to do it.

They have a very elegant mechanism, essentially shared memory with status flags, that allow synchronised access to buffers between the two parties. This is a much better idea than buffer passing, and better and simpler than what I was thinking of.

I should have thought of this because I recall that some many years ago, Falcon 4 used a similar technique to provide state information, rather than providing a full API. It worked very well.

Thanks for all the thoughts.

Moderation Process Reminder

Thread: Buffer Memory Management

Thread Tools

Display

Buffer Memory Management

Bookmarks

Bookmarks

Posting Permissions