What comes to a Python developer’s mind when he needs to write an I/O bound application? Of course
async. Why? Usually, because it’s a trend. And I’ve seen this case many times. But is this always a good idea?
Let’s consider a case when we’re writing a core I/O bound component but don’t need
Let’s suppose we have several data sources, where data is represented by JSON messages.
We have Kafka as a message broker, so data sources write messages to corresponding topics. And for each topic, we need a message processor to consume a message, find related records in reference tables, and save the message to the
messages table with related foreign keys. It’s a typical case. So what do we do next? We think: “Hm, I need an I/O bound app, so I take
async”. We don’t figure out the subject area, we don’t figure out requirements. We just take
async by default thinking that it’s optimal. But it’s not always optimal. Let’s try to take a better look.
Let’s suppose we have PostgreSQL (and actually, on the real project we had that one), so we’ll need a proper tool for it. What choices do we have? The most popular and reliable are SQLAlchemy and Django ORM.
If we need a powerful library, we choose SQLAlchemy. SQLAclehemy supports
async, but in a beta state. So we get a beta library in a core component in production. it doesn’t seem right.
If we take Django ORM, we get Django, less featured ORM, and support of
async by adapters. Not impressive either.
In our project, we didn’t have a high load and we didn’t need to process a big amount of messages per second. Moreover, sequential message processing was a strict requirement dictated by the subject area, and in this case, the asynchronous design is a fundamental mistake.
So what do we have eventually? Wrong design that could cause serious bugs, “beta library” in production in a core component, and higher code complexity. I don’t think somebody would argue, that
sync code is much simpler than
But there are cases without such restrictions and with a higher load. What then? Is
async a good choice in these cases? Not always.
async could reduce the cost of infrastructure due to the better utilization of CPU time but it also increases the complexity of code, which in turn increases the cost of development and maintenance of a program and can cause more bugs. If the load is not too big and the system doesn’t need to be scaled so much, the company might take increasing the number of CPU cores as a solution. And that would be optimal in my opinion