HTTP

This is a pocket introduction to the HTTP protocol. HTTP, which stands for HyperText Transport Protocol, was first developed by Tim Berners-Lee in 1991. Thanks to its flexibility, and the Web’s ubiquity, HTTP is now used for all kinds of applications.

HTTP is a client–server protocol. An HTTP client (such as a browser) opens a connection to an HTTP server, which is waiting on a listening socket. The client then sends a request message. The server responds with a response message. This pair of messages effectively forms a remote procedure call (RPC).

Every HTTP message (request or response) consists of a header followed by optional content.

Headers

HTTP headers are made of text lines. Each line is terminated by "\r\n" (a “carriage return” character, ASCII number 13, followed by a newline, ASCII number 10).

The headers from a request might look like this:

GET / HTTP/1.0
Host: cs61.seas.harvard.edu

The headers from a response might look like this:

HTTP/1.1 200 OK
Content-Type: text/plain
Content-Length: 56
Date: Thu, 29 Nov 2012 13:47:35 GMT
Connection: close

The first line of the header section has a special format.

The first line of a request includes:

The first line of a response includes:

After the first line comes an arbitrary number of additional header field definitions. Each line contains a field name followed by a colon and a single space, and then a field value. Hundreds have been defined. Clients and servers ignore fields they don’t understand. This important property has made HTTP easy to extend and is key to its popularity.

The header section is terminated by a blank line ("\r\n"). Then comes the body.

Body

The body is included in some requests and most responses. It’s just a sequence of bytes.

A body is normally accompanied by a header field called Content-Length. This defines the number of body bytes included after the blank line that terminates the header section. For example, the response above has 56 bytes of body.

Content-Length is mandatory on requests that contain a body (most requests don’t contain a body). It is optional for responses: if a response doesn’t define Content-Length, the client will read bytes from the connection until the connection closes. But in practice it is better to provide a length. Our server always provides explicit lengths on responses.

Connection handling

In the initial versions of HTTP, each request required a separate connection. But this isn’t particularly efficient: setting up a new connection takes time and induces network overhead. Modern browsers and servers support keep-alive or persistent connections, in which an HTTP connection can carry multiple RPCs. A client or server can close a connection at any time, however, so a client must be prepared to start a new connection if the server closes an existing connection.

Persistent connections don’t work unless the server includes Content-Length headers on its responses.

Standards

Shockingly you can read about HTTP all over the Web. But don’t neglect the standards. They’re not always easy to read, but they define Truth, and learning to read standards is a useful skill.