Yes, this is how all servers start out. This is the first in a series of posts in which I’m going to go through the process of building a web application (and its web server) from scratch in Python. For the purposes of this series, I’m going to solely rely on the Python standard library and I’m going to ignore the WSGI standard. Without further ado, let’s get to it! The web server To begin with, we’re going to write the HTTP server that will power our web app. But first, we need to spend a little time looking into how the HTTP protocol works. How HTTP works Simply put, HTTP clients connect to HTTP servers over the network and send them a string of data representing the request. The server then interprets that request and sends the client back a response. The entire protocol and the formats of those requests and responses are described in , but I’m going to informally describe them below so you don’t have to read the whole thing. RFC2616 Request format Requests are represented by a series of -separated lines, the first of which is called the “request line”. The request line is made up of an HTTP method, followed by a space, followed by the path of the file being requested, followed by another space, followed by the HTTP protocol version the client speaks and, finally, followed by a carriage return ( ) and a line feed ( ) character: \r\n \r \n GET /some-path HTTP/1.1\r\n After the request line come zero or more header lines. Each header line is made up of the header name, followed by a colon, followed by an optional value, followed by : \r\n Host: example.com\r\nAccept: text/html\r\n The end of the headers section is signalled by an empty line: \r\n Finally, the request may contain a “body” — an arbitrary payload that is sent to the server with the request. Putting it all together, here’s a simple request: GET GET / HTTP/1.1\r\nHost: example.com\r\nAccept: text/html\r\n\r\n and here’s a simple request with a body: POST POST / HTTP/1.1\r\nHost: example.com\r\nAccept: application/json\r\nContent-type: application/json\r\nContent-length: 2\r\n\r\n{} Response format Responses, like requests, are made up of a series of -separated lines. The first line in the response is called the “status line” and it is made up of the HTTP protocol version, followed by a space, followed by the response status code, followed by another space, then the status code reason, followed by : \r\n \r\n HTTP/1.1 200 OK\r\n After the status line come the response headers, then an empty line and then an optional response body: HTTP/1.1 200 OK\r\nContent-type: text/html\r\nContent-length: 15\r\n\r\n<h1>Hello!</h1> A simple server Based on what we know so far about the protocol, let’s write a server that sends the same response regardless of the incoming request. To start out, we need to create a socket, bind it to an address and then start listening for connections. If you try to run this code now, it’ll print to standard out that it’s listening on and then exit. In order to actually process incoming connections we need to call the method on our socket. Doing so will block the process until a client connects to our server. 127.0.0.1:9000 accept Once we have a socket connection to the client, we can start to communicate with it. Using the method, let’s send the connecting client an example response: sendall If you run the code now and then visit in your favourite browser, it should render the string “Hello!”. Unfortunately, the server will exit after it sends the response so refreshing the page will fail. Let’s fix that: http://127.0.0.1:9000 At this point we have a web server that can serve a simple HTML web page on every request, all in about 25 lines of code. That’s not too bad! A file server Let’s extend the HTTP server so that it can serve files off of disk. Request abstraction Before we can do that, we have to be able to read and parse incoming request data from the client. Since we know that request data is represented by a series of lines, each separated by characters, let’s write a generator function that reads data from a socket and yields each individual line: \r\n This may look a bit daunting, but essentially what it does is it reads as much data as it can from the socket (in chunks), joins that data together in a buffer ( ) and continually splits the buffer into individual lines, yielding one at a time. Once it finds an empty line, it returns the extra data that it read. bufsize buff Using , we can begin printing the requests we get from our clients: iter_lines If you run the server now and visit , you should see something like this in your console: http://127.0.0.1:9000 Received connection from ('127.0.0.1', 62086)...b'GET / HTTP/1.1'b'Host: localhost:9000'b'Connection: keep-alive'b'Cache-Control: max-age=0'b'Upgrade-Insecure-Requests: 1'b'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_2) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/64.0.3282.167 Safari/537.36'b'Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8'b'Accept-Encoding: gzip, deflate, br'b'Accept-Language: en-US,en;q=0.9,ro;q=0.8' Pretty neat! Let’s abstract over that data by defining a class: Request For now, the request class is only going to know about methods, paths and request headers. We’ll leave parsing query string parameters and reading request bodies for later. To encapsulate the logic needed to build up a request, we’ll add a class method to called : Request from_socket It uses the function we defined earlier to read the request line. That’s where it gets the and the , then it reads each individual header line and parses those. Finally, it builds the object and returns it. If we plug that into our server loop, it should look something like this: iter_lines method path Request If you connect to the server now, you should see lines like this one get printed out: Request(method='GET', path='/', headers={'host': 'localhost:9000', 'user-agent': 'curl/7.54.0', 'accept': '*/*'}) Because can raise an exception under certain circumstances, the server might crash if given an invalid request right now. To simulate this, you can use telnet to connect to the server and send it some bogus data: from_socket ~> telnet 127.0.0.1 9000Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.helloConnection closed by foreign host. Sure enough, the server crashed: Received connection from ('127.0.0.1', 62404)...Traceback (most recent call last): File "server.py", line 53, in parse request_line = next(lines).decode("ascii")ValueError: not enough values to unpack (expected 3, got 1) During handling of the above exception, another exception occurred: Traceback (most recent call last): File "server.py", line 82, in <module> with client_sock: File "server.py", line 55, in parse raise ValueError("Request line missing.")ValueError: Malformed request line 'hello'. To handle these kinds of issues a little more gracefully, let’s wrap the call to in a try-except block and send the client a “400 Bad Request” response when we get a malformed request: from_socket If we try to break it now, our client will get a response back and the server will stay up: ~> telnet 127.0.0.1 9000Trying 127.0.0.1...Connected to localhost.Escape character is '^]'.helloHTTP/1.1 400 Bad RequestContent-type: text/plainContent-length: 11 Bad RequestConnection closed by foreign host. At this point we’re ready to start implementing the file serving part, but first let’s make our default response a “404 Not Found” response: Additionally, let’s add a “405 Method Not Allowed” response. We’re going to need it for when we get anything other than a request. GET Let’s define a constant to represent where the server should serve files from and a function. SERVER_ROOT serve_file takes the client socket and a path to a file. It then tries to resolve that path to a real file inside of the , returning a “not found” response if the file resolves outside of the server root. Then it tries to open the file and figure out its mime type and size (using ), then it constructs the response headers and uses the system call to write the file to the socket. If it can’t find the file on disk, then it sends a “not found” response. serve_file SERVER_ROOT os.fstat sendfile If we add into the mix, our server loop should now look like this: serve_file If you add a file called next to your file and visit you should see the contents of that file. Cool, eh? www/index.html server.py http://localhost:9000 Winding down That’s it for part 1. In part 2 we’re going to cover extracting and abstractions as well as making the server handle multiple concurrent connections. If you’d like to check out the full source code and follow along, you can find it . Server Response here See ya next time! Thanks for reading! If you liked the article, give it a clap! You can also find me on my website at https://defn.io , on GitHub and Twitter . _The latest Tweets from Bogdan Popa (@bogdanp). Programmer, creator of https://t.co/FFd6cPhKk5 and…_twitter.com Bogdan Popa (@bogdanp) | Twitter
Share Your Thoughts