In the history of human existence, communication serves as the indispensable thread that weaves connections, understanding, and knowledge. Picture this: in a corner of the comic world, on November 7, 1989, Calvin from Calvin and Hobbes engages in a conversation with his dad over the phone. This seemingly ordinary scene encapsulates a universal truth
– the need to talk.
No one is omniscient, see Calvin didn't know 11 + 7. The complexities of life, the myriad of experiences, and the wealth of information that surrounds us demand an incessant quest for understanding. We need to talk to others, at times, to seek answers, perspectives, or simply to share our thoughts. However, this act of communication is not haphazard; it follows a set of unspoken rules, guidelines, and etiquette that transform it into a nuanced art. Communication is, in essence, the bridge that connects minds, allowing ideas to flow, emotions to be shared, and knowledge to be transferred. Yet, mastering this art is no small feat. The intricacies of language, the subtleties of tone, and the all those non-verbal cues all contribute to the richness of human interaction.
Now, as a coder, I find a parallel allure in translating this intricate communication into the realm of machines talking to machines. In the vast landscape of programming, where precision and clarity are paramount, communication takes a different form.
An acronym for "Application Programming Interface", though sounding just too technical, is just a term for machines talking to machines. As with any communication, there are varied kinds of it too. SOAP (Simple Object Access Protocol), ReST (Representational State Transfer), RPC (Remote Procedure Calls), etc. just to name a few.
It's worth noting that communication in the digital realm isn't confined to APIs alone. There's also the intriguing realm of event-driven communication – an asynchronous data exchange that doesn't always wear the API label prominently. Yet, it plays a crucial role in orchestrating seamless interactions between different components, making systems agile and responsive.
I have been dealing with web-apis, flavors of ReST, for almost two decades now. And I say flavors of ReST as ReST is not as stubborn as other such protocols as SOAP simply because it is not a protocol, rather just set of guidelines or architectural style. That sets ReST apart from others in very distinct way.
ReST makes machine-to-machine communication more humane, simple and natural. Detailing a ReST api is more philosophical than technical, because one always starts with nouns when designing a ReST api.
Seriously what’s in the name? why are we so obsessed with names and naming things? I had blogged about it a decade ago. It was and still is an opinion of mine that names are nothing more than an abstraction to details. Without names conversation would be too wordy, verbose and still not very clear. We tend to name things and to make things clearer, we add more discriminating names. More the number of nouns, richer the vocabulary. Eskimos have twenty odd names for different kinds of ice. Names are not just limited to things or places, but also cover many actions.
But there is a stark difference between names and action. If names and actions are a sets, mathematical sets, you would notice cardinality, number of items in the set, of names increases overtime whereas same is not true for actions. Cut as an action has same name when you are cutting a paper as well as cutting vegetable. Actions, thus, are polymorphic in
nature. Same action can be tied to multiple nouns.
Enough with philosophy, let’s now focus on technicality. ReST came along with HTTP, and it is hands-in-glove relation. There are few misunderstandings about HTTP, mostly about usage. Most folks look at HTTP as a Transport protocol, they are not to blame though, you can download files with HTTP, so one can easily fall for it. Confusion of HTTP being a Transport protocol stems from the fact that the acronym or abbreviation literally is
Hyper-Text Transfer Protocol
When you perform an action using HTTP, it usually involves some sort of Document Shuffle, you are either sending a document to, or receiving one from, a server. The word you really want to focus on is Hyper-Text. Since most of the time the substance of a sentence is usually in the middle, unfortunately many fell for Transfer than Hyper-Text and synonymously used Transport in place of Transfer wherein both are literally different terms.
The fact is HTTP is “Application Protocol”
It works on top of real transport protocols as TCP/IP (Connection Oriented) or UDP or QUIC (Connection Less) in case of HTTP2.
To shuffle a document HTTP defines very few verbs limited to GET, PUT, DELETE and POST
And let’s not talk about POST just yet, let's circle back to it later.
Above image depicts a simple dialogue between a user and some web application. Each message in this dialog is some sort of document being shuffled between the parties. Let's not worry about what's shuffled for now. It traces a workflow for the web-application.
The data which is getting shuttled between server and the user follows certain agreed upon format so both parties can make sense of it. This formatting is dictated as Media Type. Let’s look at few examples.
Looks familiar, except may be the last one and I am not referring to the formatting.
Let’s read the last media type
application
The media is for consumption by an application not humans
vnd
This is vendor defined custom media type.
example
This is a vendor specified media type name
+xml
This media type is based on already known xml media type
We have seen media types now what does Hyper mean? User uses some application which talks to server on user’s behalf. The applications employed by users are referred as user-agents. The web browser is an example of user-agent. These user-agents understand and make sense of the media which flows to and from the server.
An application at the server must have an agenda as the conversation needs to have a closure. It must end either successfully or unsuccessfully. As with any dialogue user-agent and application on the server take turns to take next step as per the selections by the user-agent, in fact on behalf of the user. Web application is just simply nudging the user to have a closure on ongoing interaction.
Consider following interaction which depicts one request-reply.
POST /orders
Host: example.org
Content-type: application/vnd.example+xml
Accept: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
</order>
HTTP/1.1 201 Order created
Location: http://example.org/orders/1234
Content-Type: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<next xmlns=“http://schemas.example.org/state-machine”
rel=“payment”
uri=“http://example.org/orders/1234/payment”
/>
<next xmlns=“http://schemas.example.org/state-machine”
rel=“self”
uri=“http://example.org/orders/1234”
/>
</order>
Lets see what user-agent is sending as a request
Request Method
One of HTTP verbs in above request it is POST
Request URI
URI of the resource. In above request it is /orders
Request Headers
Accept lets the server know what media type user-agent is interested in
Content-Type tells the server the data, being sent in the request, is in what format
Request Body
And finally the data which is being sent to the server.
Let’s breakdown the response received from the server
It talks about the consequence of the request
HTTP/1.1 201 Order created
201 is a status code telling the user-agent that request has been successfully executed and as a consequence a resource is being created. Now resource may sound too technical but in reality what it simply means a new object is created on the server.
Location: http://example.org/orders/1234
Location is a response header. It tells user-agent, if interested, it can follow the link in the header and redirect the user-agent to, in this case, to newly created resource.
Content-Type: application/vnd.example+xml
It simply tells the user-agent that the data which is being sent to it is using application/vnd.example+xml media type.
Rest of the bytes sent by the server are simply the actual data.
Now since the media type in this case is a custom one, it all depends on user-agent to make sense of it. In the example, user-agent may be smart enough to parse the response data and
Proceed to payments
Or simply interact with the new created order resource.
In either case user-agent may use various HTTP verbs to interact with these resources.
Server is simply nudging the user, via the user-agent, to finish up the workflow of placing an order.
Since there are two outcomes of the decision taken by user-agent there are two possible interactions.
DELETE /orders/1234
Host: example.org
Accept: application/vnd.example+xml
HTTP/1.1 200 Okay
Content-Type: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<status>Cancelled</status>
</order>
POST /orders/1234/payment
Host: example.org
Content-Type: application/vnd.example+xml
Accept: application/vnd.example+xml
<payment orderId=“1234” xmlns=“http://schemas.example.org/”>
…
</order>
HTTP/1.1 201 Receipt generated
Location: http://example.org/orders/1234/receipt
Content-Type: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<status>Not Ready</status>
<next xmlns=“http://schemas.example.org/state-machine”
rel=“self”
uri=“http://example.org/orders/1234”
/>
</order>
In all these interaction user is presented with enough coordination and the media itself to be able to proceed with successfully completing the workflow.
In case user-agent followed payments and it was successfully processed, as a side-effect receipt would be generated. Coordination data says so by 201 and gives uri to receipt as Location header. Client can now poll the self uri to see if order is ready.
Polling may sound stupid, but it is extremely effective with use of caches. These caches lessen the burden on the server as more often order is simply not ready immediately as it takes time to work with the order.
Important point is client has some part of application state locally cached or cached along the chain.
Now about the POST which was introduced in subtly different way. I am treating it post other verbs. Creations are mostly associated with POST, when in fact GET followed by a PUT can do the job.
For example creating a new order can be a series of operations as
Then why POST?
Answer is quite obvious, I am merely making it explicit now. One important thing to note is that it’s multi-user application
Let's say new order-id is a function of number of orders in the collection + 1. Since it is multi-user application there can be more than one users simultaneously wanting to create new order. All these users ask, GET, for last order-id and all will end up with same. When these users PUT new order, last person to do so wins. Users come to know about failure or success only by GETting the order by order-id and verify it with locally created order. Failed users simply must redo the same sequence till they succeed. Rather if users just let the server know of their intention of creating a new order and the almighty server having wholistic application state would be in better know-how to process the requests. GET / PUT / DELETE or any sequence of it would not be sufficient to express such intent so POST. No wonder it sounds and feels alien.
Caches introduce a consistency problem though. Say, client has been told to cache a response for few seconds and meanwhile order is ready. Client, based on the cached response, decided to cancel the order. In this case client and server state has diverged. To make it safe to cancel, client can add a condition in the form of If-Match header. Server, in case where the condition does not match simply refuses to process the request telling client that the precondition to honor the request has failed. In the unlikely event, client did not send the If-Match headers, server can still report it as 409 conflict. Or 406 not acceptable.
DELETE /orders/1234
Host: example.org
If-Match: status=Not Ready
Accept: application/vnd.example+xml
HTTP/1.1 412 Precondition failed.
Content-Type: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<status>Ready</status>
<next xmlns=“http://schemas.example.org/state-machine”
rel=“self”
uri=“http://example.org/orders/1234”
/>
</order>
As a business user, wouldn’t you want to know why was the order cancelled? These reasons are beyond api designing and delve more into business asks. If the api has implemented DELETE to cancel the order a typical interaction may look as bellow.
DELETE /orders/1234
Host: example.org
Accept: application/vnd.example+xml
HTTP/1.1 200 Okay
Content-Type: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<status>Cancelled</status>
</order>
Problem with above interaction is that DELETE does not accept any body. How can we solve this?
Let’s try using POST
POST /orders
Host: example.org
Accept: application/vnd.example+xml
OrderId=14
&Operation=Cancel
&Reason=Just to annoy you
HTTP/1.1 200 Okay
Content-Type: application/vnd.example+xml
<order xmlns=“http://schemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<status>Cancelled</status>
</order>
This does not even look right from the get-go. Feels like we are performing a business operation specified by Operation. Operation is now a keyword which needs to be handled beyond limited set of HTTP verbs. This way the api would keep on adding more and more verbs which need special handling.
Once we start constraining ourselves to only four verbs which HTTP provides, we can definitely do better. Lets revisit the same problem with the api designing constraints.
Do not forget that you have limitless supply of nouns. And we are naturally wired to produce random names all the time.
POST /CancelledOrders
Host: example.org
Accept: application/vnd.example+xml
OrderId=14
&Reason=Just to annoy you
HTTP/1.1 200 Okay
Content-Type: application/vnd.example+xml
<order xmlns=“http://shemas.example.org/”>
<item qty=“2”>ITEM-1</item>
<status>Cancelled</status>
</order>
Though it sounds simple to come up with names, it is very tedious. No wonder diehard proponent of ReST, Mark Baker or Tim-Berners Lee, suggested to not add any meaning to URI and making them opaque.
Opaque URIs relieves us from the need to produce new nouns every now and then at the cost of comprehensible api. This also makes documenting the api difficult as api end-points are decided at run-time. Also, security and relevant features can only be useful once the opaque uris are established.
Roy Fielding, on the other hand, has an opinion that uris are meaningful and it sort of makes more sense. Opaque URIs are for lazy folks.
Now lets focus on Re of ReST. It’s not the resource which is sent to the user-agent. User-agent asks for specific type of media it can understand. Let’s consider following HTTP interaction.
GET /orders/1234
Host: example.org
Accept: text/plain
If the web-application does not understand the requested media type, it can simply state this inability as
HTTP/1.1 415 Unsupported Media Type.
Or if it does indeed supports the requested media type
HTTP/1.1 200 Okay
Content-Type: text/plain
Order 1234 is not ready
If user-agent requests for media types which are supported by the web application, it can send the resource data fitting the requested media type. Here it is the same resource but formatted as different media types and all these variations of same resource is called representations. Since the data which flows between the user-agent and the server can be different representations, it is known as Representational State Transfer or simply ReST.