Turing test

The Turing test is a proposal for a test of a machine's capability to perform human-like conversation. Described by Alan Turing in 1950, it proceeds as follows: a human judge engages in a natural language conversation with two other parties, one a human and the other a machine; if the judge cannot reliably tell which is which, then the machine is said to pass the test. It is assumed that both the humans and the machine try to appear human. In order to keep the test setting simple and universal (to explicitly test the linguistical capability of some machine), the conversation is usually limited to a text-only channel.

The test was inspired by a party game where guests try to guess the gender of a person in another room by writing a series of questions and reading the answers sent back. In Turing's original proposal, the human participants had to pretend to be the other gender, and the test was limited to a five-minute conversation. These features are nowadays not considered to be essential and are generally not included in the specification of the Turing test.

Turing originally proposed the test in order to replace the emotionally charged and for him meaningless question "Can machines think?" with a more well-defined one.

Turing predicted that machines would eventually be able to pass the test. In fact, he estimated that by the year 2000, machines with 10⁹ bits (about 119MB) of memory would be able to fool 30% of human judges during a 5-minute test. He also predicted that people would then no longer consider the phrase "thinking machine" contradictory. He further predicted that machine learning would be an important part of building powerful machines, a claim which is considered to be plausible by contemporary researchers in Artificial intelligence.

It has been argued that the Turing test can not serve as a valid definition of machine intelligence or "machine thinking" for at least three reasons:

A machine passing the Turing test may be able to simulate human conversational behavior, but this may be much weaker than true intelligence. The machine might just follow some cleverly devised rules. (A common rebuttal in the AI community has been to ask: how do we know humans don't just follow some cleverly devised rules?)
A machine may very well be intelligent without being able to chat like a human.
Many humans that we'd probably want to consider intelligent might fail this test (e.g. the young or the illiterate). (On the other hand, the intelligence of fellow humans is almost always tested exclusively based on their utterances.)

Another potential problem, related to the first objection above, is that even if the Turing test is a good operational definition of intelligence, it may not indicate that the machine has consciousness, or that it has intentionality. Perhaps intelligence and consciousness, for example, are such that neither one necessarily implies the other. In that case, the Turing test might fail to capture one of the key differences between intelligent machines and intelligent people.

One interesting part of his proposed test was that the answers in conversation would have to be delivered at controlled intervals and rates. He believed this necessary to prevent the observer drawing a conclusion based on the fact the computer answered so much slower than the human operator. This is still necessary, but the concern now is that computers are much faster than people.

So far, no computer has passed the Turing test as such. Simple conversational programs such as ELIZA have fooled people into believing they are talking to another human being, such as in an informal experiment termed AOLiza. However, such "successes" are not the same as a Turing Test. Most obviously, the human party in the conversation has no reason to suspect they are talking to anything other than a human, whereas in a real Turing test the questioner is actively trying to determine the nature of the entity they are chatting with. Documented cases are usually in environments such as Internet Relay Chat where conversation is sometimes stilted and meaningless, and in which no understanding of a conversation is necessary, are common. Additionally, many relay chat participants use English as a second or third language, thus making it even more likely that they would assume that an unintelligent comment by the conversational program is simply something they have misunderstood, and are also probably unfamiliar with the technology of "chat bots" and don't recognize the very non-human errors they make. See ELIZA effect.

The Loebner prize is an annual competition to determine the best Turing test competitors. Whilst they award an annual prize for the computer system that, in the judges' opinions, demonstrates the "most human" conversational behaviour, they have an additional prize for a system that in their opinion passes a Turing test. This second prize has not yet been awarded.

See also: Artificial intelligence, Captcha, Chatterbot, Chinese Room, Loebner prize, Mark V Shaney (computer program)

Roger Penrose wrote a book on these subjects: The Emperor's New Mind

References

Alan Turing, "Computing Machinery and Intelligence", Mind, vol. LIX, no. 236, October 1950, pp. 433-460. Online at:
- http://cogprints.soton.ac.uk/documents/disk0/00/00/04/99/
- http://www.loebner.net/Prizef/TuringArticle.html
Loebner prize home page
Stanford Encylopedia of Philosophy entry on the Turing test, by G. Oppy and D. Dowe.
Turing Test: 50 Years Later reviews a half-century of work on the Turing Test, from the vantage point of 2000.