Main Page | See live article | Alphabetical index

Myhill-Nerode Theorem

In the theory of formal languages, the Myhill-Nerode Theorem provides a necessary and sufficient condition for a language to be regular. It is almost exclusively used in order to prove that a given language is not regular.

Given a language L, define a relation RL on strings by the rule x RL y if there is no distinguishing extension z with the property that exactly one of the strings xz and yz is in L. It is easy to show that RL is an equivalence relation on strings, and thus it divides the set of all finite strings into one or more equivalence classes.

The Myhill-Nerode Theorem states that the number of states in the smallest automaton accepting L is equal to the number of equivalence classes in RL. The intuition is that if one starts with such a minimal automaton, then any strings x and y that drive it to the same state will be in the same equivalence class; and if one starts with a partition into equivalence classes, one can easily construct an automaton that uses its state to keep track of the equivalence class containing the part of the string seen so far.

A consequence of the Myhill-Nerode Theorem is that a language L is regular (i.e., accepted by a finite state machine) if and only if the number of equivalence classes of RL is finite.

The immediate corollary is that if a language defines an infinite set of equivalence classes, it is not regular. It is this corollary that is frequently used to prove that a language is non-regular.

Example proof of non-regularity

Consider the language . Now consider the infinite set of strings . For any two strings from this set x = ai, y = ak with i < k, we can append z = bk to each, which results in xz = aibk, which is not in L, and yz = akbk, which is in L. Thus each string of the form ai belongs to a different equivalence class, so there are an infinite number of equivalence classes defined by the language, and so by the Myhill-Nerode Theorem it is not regular.

Note that this language can be "pumped" in the sense that any nonempty string in the language can be expanded by replacing a single a with arbitrarily many copies of a. The language thus gives an example of a non-regular language that cannot be shown to be non-regular using the pumping lemma.