Buffer gap

The buffer gap technique is used to store text compactly in text editors, where most changes to the text occur at or near the current location of the cursor. The text is stored in a large buffer in two contiguous segments, with a gap between them for inserting new text. Moving the cursor involves copying text from one side of the gap to the other (sometimes this process is delayed until the next operation that changes the text). Insertion adds new text at the end of the first segment. Deletions increase the size of the gap.

The advantage of using a buffer gap over more sophisticated data structures is that the text is represented simply as two literal strings, which take very little extra space and which can be searched and displayed very quickly. The disadvantage is that operations on very large files—particularly those that involve many changes at different locations in the file or that fill up the gap, requiring a new gap to be created—require recopying most of the text. The use of buffer gaps in practice is based on the assumption that such recopying occurs rarely enough that its cost can be amortized over the more common cheap operations.

A buffer gap is used in most Emacs editors.

Below are some examples of operations with buffer gaps. The gap is represented pictorially by the empty space between the square brackets. This representation is a bit misleading: in a typical implementation, the endpoints of the gap are tracked using pointers, and the contents of the gap are ignored; this allows, for example, deletions to be done by adjusting a pointer without changing the text in the buffer.

Initial state:

This is the way[                    ]out.

User inserts some new text:

This is the way the world began [   ]out.

User moves cursor after "out":

This is the way the world began out[   ].

User deletes "out":

This is the way the world began[       ].