Imagine this: you’re in a team meeting, and your manager tells you there’s an old service, written a couple of years ago, that used to work fine but now breaks in some weird way. And you need to fix it. On top of that, they ask you to implement a new feature that the team has been wanting for a long time. At this point, your happiness for the next few hours, days, or even weeks largely depends on what the previous developer left behind. You open the code and find… something. There are a few possibilities, depending on how careful the previous developer was. Maybe they were a FORTRAN enthusiast doing a proof-of-concept who planned to rewrite the service before production. Design? Forget about it. Everything is stuffed into a single procedure. Or maybe they were a true engineering guru: they defined a set of simple, strict abstractions that perfectly captured the domain, implemented everything cleanly, covered it with tests, and documented it. Or something in between. The closer the code is to the second scenario, the less pain you’ll have digging through it. Now, let’s put ourselves in the shoes of that service’s author. How do we get closer to the second scenario and further away from the first one? My name is Dmitry, I’m a software engineer and Python mentor. My experience in real-world projects has shown me how painful dealing with implicit or messy state can be — very similar to the config issues I covered in my previous article. My name is Dmitry, I’m a software engineer and Python mentor. My experience in real-world projects has shown me how painful dealing with implicit or messy state can be — very similar to the config issues I covered in my previous article. In this article, I’ll show one approach — borrowed from distributed systems development — that helps you manage state responsibly, reduce overall code complexity, and ship features faster. Why complexity matters Why complexity matters Software complexity affects almost every aspect of software engineering. The more complex an application becomes, the harder it is to: read and understand its source code
add new features without breaking existing behaviour
predict how the system will behave under real load
reason about its internal state, and therefore…
…debug it when something inevitably goes wrong read and understand its source code add new features without breaking existing behaviour predict how the system will behave under real load reason about its internal state, and therefore… …debug it when something inevitably goes wrong What makes this especially dangerous is that complexity rarely announces itself. The code may still “work”, tests may still pass, and yet every change starts to feel risky. Progress slows down, confidence drops, and even small features begin to require disproportionate effort. Two parts of complexity Two parts of complexity For any non-trivial application, complexity comes in two very different forms. 1. Domain complexity This is the complexity inherent to the problem you’re solving. If you’re building a weather forecasting system, you need to understand meteorology, statistics, and prediction models. There is no clever architecture or design pattern that can remove this complexity — and that’s a good thing. Domain complexity is exactly what makes the application valuable. If you remove it, there’s no product left. 2. Accidental (or overhead) complexity This is the complexity introduced by how the application is designed and implemented. how Here’s a simple thought experiment. Take a Python program and list every variable name it uses. Now replace each of them with a unique, randomly generated string. Then do a global search-and-replace in the source code. Apart from some minor details, the program will behave exactly the same. You haven’t changed the domain logic at all — the type-1 complexity is untouched. But the code will become dramatically harder to read, reason about, and modify. The overhead complexity has exploded. The same thing is happening in the example from the introduction. The problem isn’t what the program does — it’s how that behaviour is packaged into the source code. does Why this distinction matters Managing complexity is largely about reducing accidental complexity while preserving the essential one. And this is not a niche concern: complexity management has a direct and measurable impact on development speed, correctness, and team morale. Most of the pain developers feel when working with legacy systems comes not from the domain being hard, but from the overhead complexity quietly accumulating over time. Application design “gap” Application design “gap” Let’s look at the kinds of applications that are typically written in Python and loosely arrange them along the spectrum of size and complexity. From smallest to largest, it often looks like this: Simple one-off scripts — essentially a replacement for small Bash scripts
Medium-sized services or tools — for example, a configurable CPU fan control service
Large distributed systems — systems that process significant amounts of data and must do so efficiently Simple one-off scripts — essentially a replacement for small Bash scripts Simple one-off scripts Medium-sized services or tools — for example, a configurable CPU fan control service Medium-sized services or tools Large distributed systems — systems that process significant amounts of data and must do so efficiently Large distributed systems These categories aren’t meant to be precise. They’re just a mental model. For class-1 programs, design rarely matters. These scripts are short-lived, often written and maintained by a single person, and rarely evolve over time. Adding too much structure or abstractions usually brings little benefit. On the other end of the spectrum, class-3 systems simply cannot exist without deliberate design. They are typically built and maintained by teams, which already requires shared abstractions and conventions. On top of that, distributed systems almost always involve persistent state — usually stored in a database — and multiple processes interacting with it concurrently. You need to define what state exists, where it lives, how it changes over time, and which components are allowed to modify it. Without clear answers to these questions, such systems fall apart very quickly. For classes 1 and 3, the situation is clear: one doesn’t need design at all, and the other cannot survive without it. The problem lies in the middle. Class-2 applications are often just small enough that developers can get away with a loosely defined design — or no deliberate design at all. And this is where things quietly go wrong. Projects rarely shrink; they evolve. What starts as a simple tool or service can gradually grow into something that demands careful complexity management. When that moment arrives, the lack of an explicit design suddenly has an outsized impact on developer productivity, confidence, and delivery speed. This gap — where an application is too big to stay ad-hoc, but not big enough to force proper design — is where most Python services slowly become painful to work with. force In my career, I’ve seen far more class-2 projects with unmanaged complexity than I’d ever like to admit. So instead of complaining about them, I want to share an approach I’ve been using successfully to keep such projects under control. And the idea doesn’t come from typical “Python architecture” discussions. It comes from distributed systems. A closer look at distributed systems design Earlier, we talked about class-3 systems and mentioned that their distributed nature — combined with a database — forces developers to maintain a clear design. Let’s take a closer look at why that happens. You can think of any application as a black box that consumes inputs and produces outputs. Now imagine a horizontally scalable distributed system with a transactional database at its core. In such a system, workers can be added or removed dynamically based on load, and overall throughput increases as more workers are added. A common way to achieve this is to keep workers stateless and move all shared state into the database. “Stateless” here doesn’t mean that workers have no state at all — every running program does — but that a worker’s local state is never relevant to other workers. To do its job, a worker only needs access to its own state and the database. With this setup, workers are almost completely independent. The remaining challenge is state updates: workers still need to modify shared state in the database, and those updates could affect other workers. This is where transactions come in. By defining what it means for the database to be in a valid state and using transactions to ensure that state only ever moves between valid states, the database itself enforces consistency. As a result, workers can be written under a simple assumption: the state they read from the database is always valid. This allows developers to reason about each worker in isolation, without caring about what other workers are doing at the same time. Takeaways from distributed systems design This design works extremely well for distributed systems and horizontal scalability, but its usefulness doesn’t stop there. It turns out to be a generally strong way to structure software — not just distributed systems, but applications in general. At its core, this approach has a few important properties: the structure of the application state is explicitly defined (typically via a database schema)
the application state is always valid
state evolution is clearly defined by the workers’ code
all interactions with the state follow a strict contract, which decouples components from each other
this decoupling allows components to be tested independently
testing is further simplified by the ability to mock the database
finally, the application state can be thoroughly inspected by directly querying the database the structure of the application state is explicitly defined (typically via a database schema) the application state is always valid state evolution is clearly defined by the workers’ code all interactions with the state follow a strict contract, which decouples components from each other this decoupling allows components to be tested independently testing is further simplified by the ability to mock the database finally, the application state can be thoroughly inspected by directly querying the database Together, these properties result in code that is clean, testable, and easy to reason about. This naturally raises a question: can the same ideas be applied to systems that are not distributed? not Applying these ideas to class-2 systems The fact that class-2 systems don’t strictly require a carefully thought-out design doesn’t mean that such a design wouldn’t help. Having learned valuable lessons from distributed systems, it makes sense to try applying them here as well. One obvious way to do that would be to turn a class-2 system into a distributed one. In some cases, that might even be justified. However, as a general solution it is rarely practical. Distributed systems backed by databases introduce significant overhead that plain Python projects don’t otherwise need: hardware provisioning, including the database
networking
database management
database deployment and migrations hardware provisioning, including the database networking database management database deployment and migrations For many projects, this cost makes such an approach infeasible. The good news is that distribution is not required to apply the core ideas. Below are two ways to borrow them without turning the application into a distributed system. Option 1: no to distributed, no to a database Option 1: no to distributed, no to a database Most of the complexity-management benefits can be achieved simply by structuring the code around explicit state: strictly define the core application state (for example, using dataclasses or Pydantic models)
define explicit state-update scenarios
treat the core state as an external dependency by passing it into the functions that implement those scenarios (similar to dependency injection)
avoid interaction between scenarios through anything other than the core state strictly define the core application state (for example, using dataclasses or Pydantic models) core application state define explicit state-update scenarios treat the core state as an external dependency by passing it into the functions that implement those scenarios (similar to dependency injection) avoid interaction between scenarios through anything other than the core state This approach already gets you most of the way: the state and its transitions are clearly defined
the state remains valid
transitions are decoupled
transitions are easy to test, especially since the core state is easy to mock the state and its transitions are clearly defined the state remains valid transitions are decoupled transitions are easy to test, especially since the core state is easy to mock What’s missing here is state inspectability and the transactional guarantees provided by databases. The latter is usually not an issue in single-threaded applications, but it can become problematic in multi-threaded ones. In such cases, additional synchronization or libraries may be required — or you can move on to the second option. Option 2: no to distributed, yes to a database Option 2: no to distributed, yes to a database For most database systems, provisioning and operating them is a significant burden, which often rules them out for class-2 Python applications. However, there is at least one practical exception: SQLite. SQLite is an SQL database that requires almost no setup and stores its entire state in a file. Despite its simplicity, it provides transactions and isolation guarantees, allowing you to retain many of the benefits of a production-grade database without the operational overhead. While SQLite has limitations in both functionality and performance, they are far less restrictive than many developers assume. By using SQLite to store application state and applying the same design principles discussed for distributed systems, it’s possible to achieve the same complexity-management benefits — including state inspectability and transactional safety in multi-threaded scenarios. Conclusion In medium-sized Python projects, complexity rarely explodes all at once. It creeps in quietly and almost always involves poor state management. When the state is implicit, and its guarantees and assumptions are undocumented, every change becomes riskier than it should be. Distributed systems ran into this problem early and solved it the hard way. By making state explicit and treating state changes as first-class operations, they found a way to keep complexity under control. The same mindset works just as well in non-distributed Python code. You don’t need heavy infrastructure to apply it — just a clear model of state and disciplined rules for how it changes. Do that, and your codebase stays understandable, even as the project grows.

Poor State Management Breaks Everything (and Why Distributed Systems Do It Better)

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

The End of Config Hell in Python, Thanks to Pydantic v2

10 Steps To Digital Transformation While Simultaneously Cutting Costs

5 Books You Can Read to Boost Your Computer Science Knowledge

The System Design Cheat Sheet: Cache

77 Stories To Learn About Distributed Systems

A Guide to API Gateways: Unveiling Advantages, Disadvantages, and Vendor Comparisons

The End of Config Hell in Python, Thanks to Pydantic v2

10 Steps To Digital Transformation While Simultaneously Cutting Costs

5 Books You Can Read to Boost Your Computer Science Knowledge

The System Design Cheat Sheet: Cache

77 Stories To Learn About Distributed Systems

A Guide to API Gateways: Unveiling Advantages, Disadvantages, and Vendor Comparisons

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps