Main Page | See live article | Alphabetical index

8-bit clean

"Eight-bit clean" is a term which describes a computer system that deals correctly with extended character sets which (unlike ASCII) use all eight bits of a byte. Up to the early 1990s, many programs and communications systems used to assume that all characters have codes in the range 0 to 127. This leaves the top bit of each byte free for use as a parity bit or some kind of flag bit. These assumptions make such systems unusable on text data that contains characters with higher character codes, which is commonplace in non-English-speaking countries with larger alphabets.

If a binary file is transmitted via a communications link which is not eight-bit clean, it will be corrupted. To combat this, encodings have been devised which use only ASCII characters. The most popular of these have been UUCP and MIME base64 encoding. There are some communication links which are not even "seven-bit clean" due to their use of non-ASCII character sets internally; they cause problems even for UUCP-encoded data. This is the reason for the introduction of base64 encoding, which has largely replaced UUCP in practice.

By the mid-1990s, practically all computer and communication systems implementations were updated to be 8-bit clean, as the systems became widely used outside the US and UK. However, for reasons of legacy compatibility 8-bit cleanliness concerns are still an issue in for example the internet e-mail protocol SMTP.

See also:

This article (or an earlier version of it) contains material from FOLDOC, used with permission.