How we develop in and with distributed systems

Distributed systems are hard. Developing in and with distributed systems is even harder.

What’s so hard? Well, first off we’re dealing with two aspects here:

The cluster, CL in the figure below, that is, the distributed system itself such as DC/OS or Kubernetes or Spark or Kafka.
The development environment, DE in the following, which could be anything from vi/Emacs to an IntelliJ IDEA.

Fundamental options for developing in and with distributed systems.

The second distinction we have to make is that of local vs. remote, from the point of view of the developer. Local means: runs on my machine. Remote means: runs somewhere in The Cloud© (or yeah, in your datacenter, welcome to 2017 ;)

Let’s walk through the four fundamental options now:

CLASS I

Both CL and DE are local. Examples are K8S minikube, DC/OS Vagrant and Docker Compose.

From the developer’s POV the pros of this approach are:

No costs for online stuff, can run as long as I want.
Fully under my control.

And the cons:

Can’t realistically cover all cases of a distributed systems such as network delays (or, in the worst case partitioning) or clock skews. People sometimes forget about the fallacies of distributed computing, but these issues still exist, no matter if you’ve heard of it or not.
Doesn’t really scale. Well, only vertically. Typically must be supplemented by also deploying the code into a (real) distributed dev/test environment.

CLASS II

CL and DE are located where one would expect it. The CL is made available to the DE via proxy or VPN. One example is DC/OS Tunnel.

From the developer’s POV the pros of this approach are:

Can quickly iterate and deploy/test against the real stuff.
On my machine I only need to run DE.

And the cons:

Requires online connection so offline development is either very limited or not possible at all.
Certain edge cases might not be supported because of the limitations of the tunnel/proxy.

CLASS III

Same as CLASS II in terms of separation but in order to test a service one needs to actually deploy it in the CL. This is the usual setup found in many environments, with or without a CI/CD pipeline in place.

From the developer’s POV the pros of this approach are:

This is the real thing. It’s WYSIWG and as complete as it gets.

And the cons:

As with class II it requires connectivity and offline development is most certainly not possible.
It can be super slow to iterate. You might end up waiting 5min or more to deploy a new version of your service.

CLASS IV

Both CL and DE are remote. Call it Chromebook-based development or whatever, but essentially nothing runs your machine, really, in this setup. And while I’ve written about this topic many years ago I think by and large we’re still not there yet. Examples of this category are Google Cloud Shell and Cloud9.

From the developer’s POV the pros of this approach are:

Where ever I am, where ever I go, I have all the things set up and available; no local setup/dependencies.
It scales like hell: both in terms of system and team.

And the cons:

Always online is the default. You can’t to anything offline.
You have little to no control about your data (== code and build artifacts) and depend on someone else in terms of availability of your DE.

What to choose? I don’t know your preferences, your use case, your team size, your industry, your regulatory requirements, your budget, … you get it. Personally, I believe we’ll be transitioning to CLASS IV within the next 5 to 10 years. Currently, I mostly use a CLASS II setup: it combines the authenticity of the distributed system with the (necessary) iteration speed, and if you like you can have a look at a concrete example here.