Halting problem

The halting problem is a decision problem which can be informally stated as follows:

Given a description of an algorithm and a description of its initial arguments, determine whether the algorithm, when executed with these arguments, ever halts (the alternative is that it runs forever without halting).

Alan Turing proved in 1936 that there is no general method or algorithm which can solve the halting problem for all possible inputs.

The importance of the halting problem lies in the fact that it is the first problem to be proved undecidable. Subsequently, many other such problems have been described; the typical method of proving a problem to be undecidable is to reduce it to the halting problem.

One such consequence of the halting problem's undecidability is that the Entscheidungsproblem is unsolvable, and in particular there cannot be a general algorithm that decides whether a given statement about natural numbers is true or not. The reason for this is that the proposition that states that a certain algorithm will halt given a certain input can be automatically reformulated as a statement about numbers. Since there is no algorithm that can decide if the original statement about algorithms is true or not, it follows that there is no algorithm that can decide whether the corresponding statement about numbers is true or not.

Yet another, quite amazing, consequence of the undecidability of the halting problem is Rice's theorem which states that the truth of any non-trivial statement about the function that is defined by an algorithm is undecidable. So, for example, the decision problem "will this algorithm halt for the input 0" is already undecidable. Note that this theorem holds for the function defined by the algorithm and not the algorithm itself. It is, for example, quite possible to decide if an algorithm will halt within 100 steps, but this is not a statement about the function that is defined by the algorithm.

Gregory Chaitin has given an undecidable problem in algorithmic information theory which does not depend on the halting problem. Chaitin also gave the intriguing definition of the halting probability which represents the probability that a randomly produced program halts.

While Turing's proof shows that there can be no general method or algorithm to determine whether algorithms halt, individual instances of that problem may very well be susceptible to attack. Given a specific algorithm, one can often show that it must halt for any input, and in fact computer scientists often do just that as part of a correctness proof. But every such proof requires new arguments: there is no mechanical, general way to determine whether algorithms halt.

There is another caveat. The undecidability of the halting problem relies on the fact that algorithms are assumed to have potentially infinite storage: at any one time they can only store finitely many things, but they can always store more and they never run out of memory. If the memory and external storage of a machine is limited, as it is for any computer which actually exists, then the halting problem for programs running on that machine can be solved with a general algorithm (albeit an extremely inefficient one).

Table of contents

1 Sketch of Proof
2 Formalization of the Halting Problem
3 Relationship with Gödel's Incompleteness Theorem
4 Can Humans Solve The Halting Problem?
5 Recognizing Partial Solutions

Sketch of Proof

In this sketch we represent algorithms with pseudo-code similar to Pascal, and assume a straightforward representation of the algorithms as strings.

The proof proceeds by reductio ad absurdum. We will assume that there is an algorithm Halt(a, i) that decides if the algorithm encoded by the string a will halt when given as input the string i, and then show that this leads to a contradiction.

We start with assuming that there is an algorithm Halt(a, i) that returns the string "yes" if the algorithm represented by the string a halts when given as input the string i, and returns the string "no" otherwise. Given this algorithm we can construct another algorithm Trouble(s) as follows:

 function Trouble ( s : string ) : string
 begin
   if Halt(s, s) = "no"
   then return("yes")
   else ... some infinite loop ...
 end

This algorithm takes a string s as its argument and runs the algorithm Halt, giving it s both as the description of the algorithm to check and as the initial data to feed to that algorithm. If Halt outputs "no", then Trouble outputs "yes", otherwise Trouble goes into an infinite loop. Since all algorithms can be represented by strings there will be a string T that represents the algorithm Trouble. We can now ask the following question:

What is the output of Halt(T, T)?

We know by the definition of Halt that the output must be either "yes" or "no". Let us consider both cases:

Assume that the output is "no". Since Halt decides if the encoded algorithm halts for the given input, it follows that Trouble does not halt on input T. If we look at the algorithm of Trouble we can see that this is only the case if the result of Halt(T, T) is "yes", which contradicts the assumption that the output is "no".
Assume that the output is "yes", then it follows by the definition of Halt that Trouble halts on input T. If we again look at the algorithm of Trouble we see that the result of Halt(T, T) then must be "no", which contradicts the assumption that it is "yes".

Since both cases lead to a contradiction there will always be a contradiction, and so the initial assumption, that the algorithm Halt exists, must be false.

Note the close analogy of this proof to Russell's Barber paradox: Halt(U, V) stands for "U shaves V" and T represents the "barber". Trouble is defined to ensure that T shaves precisely those that don't shave themselves.

Formalization of the Halting Problem

In his original proof Turing formalized the concept of algorithm by introducing Turing machines. However, the actual choice of formalization turns out to be of little importance. One can choose any of the many known models of computability, such as Markov algorithms, Lambda calculus, Post systems or any other Turing complete programming language.

What is important is that the formalization allows a straightforward mapping of algorithms to some data type that the algorithm can operate upon. For example, if the formalism lets algorithms define functions over strings (such as Turing machines) then there should be a mapping of these algorithms to strings, and if the formalism lets algorithms define functions over natural numbers (such as recursive functions) then there should be a mapping of algorithms to natural numbers. The mapping to strings is usually the most straightforward, but strings over an alphabet with n characters can also be mapped to numbers by interpreting them as numbers in an n-ary numeral system.

Relationship with Gödel's Incompleteness Theorem

The concepts raised by Gödel's incompleteness theorems are very similar to those raised by the halting problem, and the proofs are quite similar. In fact, a weaker form of the First Incompleteness Theorem is an easy consequence of the undecidability of the halting problem. This weaker form differs from the standard statement of the incompleteness theorem by asserting that a complete, consistent and sound axiomatization of all statements about natural numbers is unachievable. The "sound" part is the weakening: it means that we require the axiomatic system in question to prove only true statements about natural numbers (it's very important to observe that the statement of the standard form of Gödel's First Incompleteness Theorem is completely unconcerned with the question of truth, but only concerns the issue of provability).

The weaker form of the theorem can be proved from the undecidability of the halting problem as follows. Assume that we have a consistent and complete axiomatization of all true first-order logic statements about natural numbers. Then we can build an algorithm that enumerates all these statements. This means that there is an algorithm N(n) that, given a natural number n, computes a true first-order logic statement about natural numbers such that, for all the true statements, there is at least one n such that N(n) is that statement. Now suppose we want to decide if the algorithm with representation a halts on input i. We know that this statement can be expressed with a first-order logic statement, say H(a, i). Since the axiomatization is complete it follows that either there is an n such that N(n) = H(a, i) or there is an n' such that N(n' ) = ¬ H(a, i). So if we iterate over all n until we either find H(a, i) or its negation, we will always halt. This means that this gives us an algorithm to decide the halting problem. Since we know that there cannot be such an algorithm, it follows that the assumption that there is a consistent and complete axiomatization of all true first-order logic statements about natural numbers must be false.

Can Humans Solve The Halting Problem?

It might seem like humans could solve the halting problem. After all, a programmer can often look at a program and tell whether it will halt. It is useful to understand why this cannot be true. For simplicity, we will consider the halting problem for programs with no input, which is also undecidable.

To "solve" the halting problem means to be able to look at any program and tell whether it halts. It is not enough to be able to look at some programs and decide. A Turing Machine can do that. In fact, even a finite state machine can do that. There exists a finite state machine that, given any program less than a gigabyte long, immediately returns whether the program will halt (we do not know that finite state machine however -- see below for why). But if a program is long enough, the usual bound on lifespan would prevent a human from even reading the entire program. After just a few decades of reading spaghetti code, the human would already start to forget some of the details. Humans can't solve the halting problem, due to the sheer size of the input.

Even for short programs, it isn't clear that humans can always tell whether they halt. For example, suppose i, j, and k are arbitrary-precision integer variables. Then, does this C program halt?

for(i=j=k=1;--j||k;k=j?i%j?k:k-j:(j=i+=2));

This program searches until it finds an odd, perfect number, then halts. It halts if and only if such a number exists, which is a major open question in mathematics. So, after centuries of work, mathematicians have yet to discover whether a simple, 43-byte C program halts. This could be considered the ultimate obfuscated C program, and it makes it difficult to see how humans could solve the halting problem.

Recognizing Partial Solutions

No program can solve the halting problem. There are programs that give correct answers for some instances of it, and run forever on all other instances. A program that returns answers for some instances of the halting problem might be called a partial halting solver (PHS). Can we recognize a correct PHS when we see it? Let the PHS recognition problem be this: given a PHS, determine whether it returns only correct answers. This problem sounds like it might be easier than the halting problem itself. It is not. It is just as undecidable as the halting problem. This follows trivially from Rice's theorem. It also follows from the undecidability of the halting problem, as will now be shown.

For every instance of the halting problem, there is an instance of the PHS recognition problem that is just as hard to solve. For example, the previous section showed an instance of the halting problem for which no one knows the answer. Here is how it can be converted into an instance of the PHS recognition problem for which no one knows the answer:

    Input a program P
    If P = "for(i=j=k=1;--j||k;k=j?i%j?k:k-j:(j=i+=2));"
         Output "Yes, that program halts"
    else
         loop forever

Does that program return only correct answers? No one knows. This shows further just how difficult the halting problem is. There is no way to solve it in general. There isn't even a general way to know whether a program partially solves it.

References:

Alan Turing, On computable numbers, with an application to the Entscheidungsproblem, Proceedings of the London Mathematical Society, Series 2, 42 (1936), pp 230-265. online version This is the epochal paper where Turing defines Turing machines, formulates the halting problem, and shows that it (as well as the Entscheidungsproblem) is unsolvable.