Network Engineering #4 - HTTP & HTTPS

So far, we have learned how IP addresses can be used to identify, locate, route, and send messages, and how TCP and UDP attach necessary information to reach the application and establish rules for communication. Now, we are ready to discuss the layers above the 4th transport layer, which dictate how data is organized and how TCP or UDP is used.

HTTP Semantics

Hypertext Transfer Protocol (HTTP) is an application layer protocol designed for communication between a client and a web server. It specifies how data should be organized and how communication should happen at the highest level.

Request Format

GET /index.html HTTP/1.0
Host: www.tkdev.blog
User-Agent: Mozilla/5.0
...

HTTP specifies that a request should follow a specific format, as shown above. The GET is one of the methods in the request, indicating that the client wants to retrieve a resource from the server. Other methods include POST, PUT, and DELETE, which indicate that the client wants to send data to the server, update data on the server, or delete data from the server, respectively.

Next to the method, the URL of the request and the HTTP version being used are specified. The Host and User-Agent are types of HTTP headers, which specify the host the request is sent to and information about the user's device. Other headers, like Content-Type and Authorization, are relevant when methods other than GET are used. Additionally, an HTTP request can contain data in the body for methods other than GET, which can be of any type (JSON, HTML, CSS, PNG, JPEG, etc.) specified by the Content-Type.

Response Format

HTTP/1.0 200 OK
Access-Control-Allow-Origin: *
Content-Type: text/html; charset=utf-8
Date: Mon, 23 Sep 2024 00:00:00 GMT
...

HTTP also specifies that the response should follow a specific format, as shown above. The response contains the HTTP version, status code, and status message. Status codes are assigned specific meanings, such as 404 for "Not Found" and 400 for "Bad Request." Similar to requests, HTTP responses contain headers like Content-Type and Access-Control-Allow-Origin, which communicate the content type of the data in the body (if a body exists) and the origin or domain allowed to access the server. (A CORS or Cross-Origin Resource Sharing error occurs when the origin or domain from which the user is making the request is not allowed.)

HTTP 1.0 & 1.1

The way data is transmitted differs depending on the HTTP version. The first public version of HTTP, HTTP 1.0, was built on top of TCP. For each request and response pair, HTTP 1.0 opened and closed the TCP connection. This meant that when a client requested HTML, CSS, and images from the web server to load a webpage, the client and server had to establish and close a connection for each resource.

Since closing the TCP connection for every resource was inefficient, HTTP 1.0 did not last long. Instead, HTTP 1.1 became dominant, as it solved this inefficiency by introducing the Connection: Keep-Alive header, which allowed the TCP connection to remain open. HTTP 1.1 also enabled data chunks to be sent as a stream, eliminating the need to wait for the entire resource to be transmitted.

HTTP 2.0 & 3.0

Further improvements were made with the creation of HTTP 2.0, which introduced data compression and multiplexing. Multiplexing allows multiple requests to be bundled into one, improving efficiency. HTTP 2.0 also supports Protocol Buffers as a content type, which tends to be more lightweight and faster (this might be covered in more detail in a future article).

All HTTP versions until HTTP 2.0 were built on top of TCP, which makes sense given the reliability discussed in previous articles. However, TCP can be slower than UDP. For this reason, HTTP 3.0 utilizes QUIC, a multiplexed transport protocol built on top of UDP, for faster and reliable communications. As of now, HTTP 1.1, 2.0, and 3.0 are supported by most modern web applications.

HTTPS

Now that we understand the request and response formats and how communications are set up in various HTTP versions, we need to figure out how to secure these communications. Specifically, we want to ensure confidentiality (data cannot be seen by a third party), integrity (data is not altered), and authenticity (communication is happening with the intended server).

To achieve these qualities, Hypertext Transfer Protocol Secure (HTTPS) uses Transport Layer Security (TLS). TLS involves the use of symmetric and asymmetric key encryptions, and how they work is, unfortunately, outside the scope of this article. For context, symmetric key encryption means you can encrypt and decrypt a message with the same key, while asymmetric key encryption uses separate keys for encryption and decryption. Since it is generally faster to encrypt and decrypt with a symmetric key, TLS aims to facilitate the secure exchange of a symmetric key between the client and the authentic server.

TLS

To securely share the symmetric key, the client uses the server’s public key to encrypt a random number generated by the client and sends it back to the server. Since only the server holds the private key for decryption, the server can decrypt and retrieve the random number. After both the client and the server possess the random number, they can encrypt it (following an agreed method) to generate a new symmetric key, ensuring confidentiality.

However, this does not guarantee that the server is authentic. An attacker could impersonate the server and send a fake public key to the client. The data could also be altered by an attacker, compromising integrity. To address these issues, the public key, domain, and other server-related information are packaged together and sent to a trusted Certificate Authority (CA). The CA verifies the server's identity, attaches a digital signature to the information, and creates a TLS certificate. (The request from the server to the CA for approval with the digital signature is called a certificate signing request (CSR).)

Digital Signiture

The CA generates a digital signature by hashing the message (producing a fixed-size, seemingly random string from any input). The details of hashing are outside the scope of this article. The CA then encrypts the hash using its own private key. The resulting string is the digital signature attached to the message, which can be used to create the TLS certificate that the client will use to verify the server’s identity.

The server, having already made the CSR to the CA, sends the certificate (rather than just the public key) to the client. The client receives the certificate and decrypts the digital signature using the CA’s public key. (The CA’s identity and public key are included in the message. Browsers and operating systems usually already know the public keys of well-known CAs.) The client then hashes the original message using the same hashing algorithm that the CA used and compares the decrypted digital signature to the hashed original message. If they match, it means that the CA genuinely verified the original message (since the private key is secret) and that the message has not been modified (since the hash is unique to the exact input). This ensures both authenticity and integrity.

TLS Cont'd

Once the client verifies authenticity and integrity, it creates a random number called the premature secret and encrypts it with the server’s public key before sending it to the server. Since only the server holds the private key, no one else can decrypt the premature secret, ensuring confidentiality. Then, both the client and the server generate the symmetric key using the premature secret and agreed-upon encryption algorithm. They notify each other once the symmetric key has been created, and they are ready to communicate securely. The following diagram illustrates the entire process in a nutshell.

Since the CSR is done beforehand, the only communication that happens in real-time is the exchange of the encryption algorithm, random numbers, and premature secret between the client and the server. This process is called the TLS Handshake, and it occurs immediately after the TCP handshake (which happens after DNS resolution) for all HTTP versions except HTTP 3.0, which uses UDP.

Protocol Stacks

In previous articles, we discussed various protocols from the lower to higher levels of the OSI model, primarily used for the web. To summarize, we can represent the protocol stacks used for the web as shown below.

Since there are many abstractions and acronyms in these descriptions, I recommend reading the articles multiple times and consulting additional resources until everything makes sense.

Resources

Computerphile. 2021. TLS Handshake Explained - Computerphile. YouTube.
Computerphile. 2021. Transport Layer Security (TLS) - Computerphile. YouTube.
Nasser, H. 2019. Hyper Text Transfer Protocol Crash Course - HTTP 1.0, 1.1, HTTP/2, HTTP/3. YouTube.