|Table of contents|
2 String utilities
3 String manipulation
5 Strings in theoretical computer science
6 See also
Representation in programming languages
A common representation is an array of character codes, occupying one byte (e.g. in ASCII code) or two bytes (e.g. in unicode) each. The length can be stored implicitly by using a special terminating character (often NUL, ASCII code 0) -- the C programming language uses this convention (see C string) -- or explicitly, for example by prefixing the string with integer value (convention used in Pascal).
The length of a string in the above example 5 characters, but note that it occupies 6 bytes. Characters after the terminator do not form part of the representation; they may be either part of another string or just garbage.
Here is the equivalent Pascal string:
Of course, other representations are possible. Using trees and lists makes certain string operations, such as character insertions or deletions, more efficient.
Strings are such a useful datatype that several languages have been designed in order to make string processing applications easy to write. Examples include:
UNIX utilities perform simple string manipulations and can be used to easily program some powerful string processing algorithms. Files and finite streams may be viewed as strings.
A simple operation on strings is concatenation. Other common operations include searching a substring in a longer string, sorting a list of strings and parsing a string. Because there are so many practical uses for strings there are many associated algorithms with various tradeoffs.
Advanced string algorithms often employ complex mechanisms and data structures, among them suffix trees and finite state machines.
There are a variety of string-processing algorithms for doing various things with strings: