Transmission Control Protocol

Transmission Control Protocol (TCP) is a connection-oriented, reliable delivery byte-stream transport layer protocol currently documented by IETF RFC 793.

In the TCP/IP model, TCP provides an interface between a network layer below and an application layer above. Applications send streams of 8-bit bytes to TCP for delivery onto the network. TCP delineates the byte stream into appropriately sized segments, usually defined by a maximum transmission unit (MTU) size used by the data link layer below.

OSI model

Application layer	FTP	SMTP	HTTP	...
Transport layer	TCP		UDP
Network layer	IP ICMP			ARP
data link layer	Ethernet	Token Ring	FDDI	...

Table of contents

1 Protocol Operation

1.1 Connection establishment
1.2 Data transfer
1.3 Connection termination

2 TCP ports
3 TCP development
4 Alternatives to TCP
5 External Links

Protocol Operation

TCP connections contain three phases: connection establishment, data transfer and connection termination. A 3-way handshake is used to establish a connection. A four-way handshake is used to tear-down a connection. During connection establishment, parameters such as sequence numbers are initialized to help ensure ordered delivery and robustness.

Connection establishment

While it is possible for a pair of end hosts to initiate a connection between themselves simultaneously, typically one end opens a socket and listens passively for a connection from the other. This is commonly referred to as a passive open, and it designates the server-side of a connection. The client-side of a connection initiates an active open by sending an initial SYN segment to the server as part of the 3-way handshake. The server-side should respond to a valid SYN request with a SYN/ACK. Finally, the client-side should respond to the server with an ACK, completing the 3-way handshake and connection establishment phase.

Data transfer

During the data transfer phase, a number of key mechanisms determine TCP's reliability and robustness. These include using sequence numbers for ordering received TCP segments and detecting duplicate data, checksums for segment error detection, and acknowledgements and timers for detecting and adjusting to loss or delay.

During the TCP connection establishment phase, initial sequence numbers (ISNs) are exchanged between the two TCP speakers. These sequence numbers are used to identify data in the byte stream, and are numbers that identify (and count) user data bytes. There are always a pair of sequence numbers included in every TCP segment, which are referred to as the sequence number and the acknowlegement number. A TCP sender refers to its own sequence number simply as the sequence number, while the TCP sender refers to receiver's sequence number as the acknowlegement number. To maintain reliability, a receiver acknowleges TCP segment data by indicating it has received up to some location of contiguous bytes in the stream. An enhancement to TCP, called selective acknowlegement (SACK), allows a TCP receiver to acknowlege out of order blocks.

Through the use of sequence and acknowledgement numbers, end hosts can properly deliver received segments in the correct byte stream order to a receiving application. Sequence numbers are 32-bit, unsigned numbers, which wrap to zero on the next byte in the stream after 2^32-1. One key to maintaining robustness and security for TCP connections is in the selection of the ISN.

A 16-bit checksum, consisting of a one's complement sum of the contents of the TCP segment header and data, is computed by a sender, and included in a segment transmission. The TCP receiver recomputes the checksum on the received TCP header and data. If the receiver's computed checksum matches the received checksum, the segment is assumed to have arrived intact and without error.

The TCP one's complement checksum is a quite weak check by modern standards: it restricts TCP to being used over links with quite low bit error rates for data in received packets. If TCP was to be redesigned today, it would most probably have a 32-bit CRC specified as an error check instead of the current checksum. The weak checksum is partially compensated for by the common use of a CRC or better integrity check at layer 2, below both TCP and IP, such as is used in PPP or the Ethernet frame. However, this does not mean that the 16-bit TCP checksum is redundant: remarkably, surveys of Internet traffic have shown that software and hardware errors that introduce errors in packets between CRC-protected hops are common, and that the end-to-end 16-bit TCP checksum catches most of these simple errors.

Acknowlegements for data sent, or lack of acknowlegements, are used by senders to implicity interpret network conditions between the TCP sender and receiver. Coupled with timers, TCP senders and receivers can alter the behavior of the flow of data. This is more generally referred to as flow control, congestion control and/or congestion avoidance. TCP uses a number of mechanisms to achieve both robustness and high performance. These mechanisms include the use of a sliding window, the slow start algorithm, the congestion avoidance algorithm, the fast retransmit and fast recovery algorithms, and more. Enhancing TCP to effectively handle loss, minimize errors, manage congestion and go fast in very high-speed environments are ongoing areas of research and standards development.

Connection termination

The connection termination phase uses a a four-way handshake, with each side of the connection terminating independently. Therefore, a typical teardown requires a pair of FIN and ACK segments for each end.

TCP ports

TCP uses the notion of port numbers to identify sending and receiving applications. Each side of the TCP connection has an associated 16-bit unsigned port number assigned to the sending or receiving application. Ports are categorized into three basic categories: well known, registered and dynamic/private. The well known ports are assigned by the Internet Assigned Numbers Authority (IANA) and are typically used by system-level or root processes. Well known applications running as servers and passively listening for connections typically use these ports. Some examples include: FTP (21), TELNET (23), SMTP (25) and HTTP (80). Registered ports are typically used by end user applications as ephemeral source ports when contacting servers, but they can also identify named services that have been registered by a third party. Dynamic/private ports are can also be used by end user applications, but less commonly so. They typically do not contain any meaning outside of a particular TCP connection.

TCP development

TCP is fairly complex and evolving protocol. While significant enhancements have been made and proposed over the years, its basic operation has not changed significantly since RFC 793, published in 1981. RFC 1122, Host Requirements for Internet Hosts, clarified a number of TCP protocol implementation requirements. RFC 2581, TCP Congestion Control, one of the most important TCP related RFCs in recent years, describes updated algorithms used by TCP to avoid cogestion. In 2001, RFC 3168 was written to describe explicit congestion notification (ECN), a congestion avoidance signalling mechanism, to the list of important RFCs that update the original specification. In the early 21st century, TCP is typically used in approximately 95% of all Internet packets. Common applications that use TCP include HTTP/HTTPS (world wide web), SMTP/POP3/IMAP (email) and FTP (file transfer). Its widespread use is testimony to the original designers that their creation was exceptionally well done.

Alternatives to TCP

However, TCP is not appropriate for many applications, and newer transport layer protocols are being designed and deployed to address some of the inherent weaknesses. For example, many real-time applications often do not need, and will suffer from, TCP's reliable delivery mechanisms. In those types of applications, it is often better to deal with some loss, errors or congestion than try to adjust for them. Example applications that do not typically use TCP include real-time streaming multimedia (such as Internet radio) and some real-time multiplayer games. Any application that doesn't require reliability, or that wants to minimize functionality, may choose to avoid using TCP. In many cases, the User Datagram Protocol (UDP) may be used in place of TCP when just application multiplexing services are required.

External Links

RFC793
IANA Port Assignments
Sally Floyd's homepage
John Kristoff's Overview of TCP
When The CRC and TCP Checksum Disagree
Introduction to TCP/IP - with some pictures