HTTP for CS 61

This is a pocket introduction to the HTTP protocol. HTTP, which stands for HyperText Transport Protocol, was first developed by Tim Berners-Lee in 1991. Thanks to its flexibility, and the Web’s ubiquity, HTTP is now used for all kinds of applications.

HTTP is a client–server protocol. An HTTP client (such as a browser) opens a connection to an HTTP server, which is waiting on a listening socket. The client then sends a request message. The server responds with a response message. This pair of messages effectively forms a remote procedure call (RPC).

Every HTTP message (request or response) consists of a header followed by optional content.

Headers

HTTP headers are made of text lines. Each line is terminated by "\r\n" (a “carriage return” character, ASCII number 13, followed by a newline, ASCII number 10).

The headers from a request might look like this:

 GET / HTTP/1.0
 Host: cs61.seas.harvard.edu

The headers from a response might look like this:

 HTTP/1.1 200 OK
 Content-Type: text/plain
 Content-Length: 56
 Date: Thu, 29 Nov 2012 13:47:35 GMT
 Connection: close

The first line of the header section has a special format.

The first line of a request includes:

Method (GET, POST, etc.) Identifies the kind of operation the client wants to perform. For normal web page loads, this is GET. For form submissions (like when you enter your password), and for RPC requests that will change state on the server, it should be POST. There are some technical differences between the methods. For example, Web caches can cache the response to GET requests and reuse them on future requests, whereas POST requests are rarely cached. Furthermore, a Web page can ask a user’s browser to make an arbitrary GET request to any other server (by putting the GET request in an tag), but browsers limit the ability to make POSTs. Nevertheless, in practice, the distinction between the methods is somewhat blurry. Some developers have outlined philosophies of when to use each method.

Request URI This is the name of the resource being loaded on the server. It’s the part of a URL that follows the host name.

HTTP version This identifies the connection as HTTP and says what version of the HTTP protocol the client is using.

The first line of a response includes:

HTTP version See above.

Status code A three-digit number that indicates whether the request succeeded. There’s a long list of status codes. The most common status code is 200, which means “OK”.

Reason phrase An optional English phrase that describes the status further. The browser might print this out on error.

After the first line comes an arbitrary number of additional header field definitions. Each line contains a field name followed by a colon and a single space, and then a field value. Hundreds have been defined. Clients and servers ignore fields they don’t understand. This important property has made HTTP easy to extend and is key to its popularity.

The header section is terminated by a blank line ("\r\n"). Then comes the body.

Body

The body is included in some requests and most responses. It’s just a sequence of bytes.

A body is normally accompanied by a header field called Content-Length. This defines the number of body bytes included after the blank line that terminates the header section. For example, the response above has 56 bytes of body.

Content-Length is mandatory on requests that contain a body (most requests don’t contain a body). It is optional for responses: if a response doesn’t define Content-Length, the client will read bytes from the connection until the connection closes. But in practice it is better to provide a length. Our server always provides explicit lengths on responses.

Connection handling

In the initial versions of HTTP, each request required a separate connection. But this isn’t particularly efficient: setting up a new connection takes time and induces network overhead. Modern browsers and servers support keep-alive or persistent connections, in which an HTTP connection can carry multiple RPCs. A client or server can close a connection at any time, however, so a client must be prepared to start a new connection if the server closes an existing connection.

Persistent connections don’t work unless the server includes Content-Length headers on its responses.

Standards

Shockingly you can read about HTTP all over the Web. But don’t neglect the standards. They’re not always easy to read, but they define Truth, and learning to read standards is a useful skill.