Poisonous Interface Type Domains

Using types with ambitious domains in interface contracts can encourage observed range dependencies and create implicit contracts.

I guess what the subtitle is saying is that services and clients implementing APIs that work with int64s (for anything but time values or memory addresses), string IDs, or other wide domain primitives are probably full of bugs that distance the explicit interface contract from the implicit contract. Implicit contracts can make iterating on service implementations very difficult (which is bad for future you).

An illustrative example will help further the lesson. Stick around for a best practice and some bad news at the end. The following snip is from the Swagger definition for the Docker API:

ServiceSpec:...properties:...Mode:properties:...Replicas:type: "integer"format: "int64"

Suggesting that there might be a use-case for scaling to anywhere near 2⁶⁴ replicas of a service is — by any definition of the word — ambitious. There are two problems with an interface definition like this.

First, the probability of anyone using the full domain of values is very low. In fact I’d wager that observing values greater than a few hundred are incredibly rare. Values greater than 2¹⁶ will almost never happen. This disparity forces the service provider to either accept and appropriately process an int64 everywhere, or to implement additional input validation and attempt to communicate those domain constraints (BTW nobody reads the documentation). In most cases basic boundary checks will be implemented (like “greater than zero” checks), but rarely have I seen code validate an upper bound. How well do you think a Docker Swarm Manager would handle being asked to create even 2³² (4,294,967,296) replicas of a service? Would it try? Would its internal dependencies be able to handle that many replicas? How would it’s clever container name generators handle the pressure? I’m not sure.

The second problem is on the client side. Most consumers of the Docker API will not be storing or performing boundary pushing math on the number of service replicas. But, let’s suppose for a moment that you are. Suppose you want to build some autoscaling logic and you need to add or subtract from the current number of replicas. The consumer should immediately notice the ridiculous side of the potential input domain and fit the actual value to some reasonable subdomain before doing any typecasting or over/underflow sensitive operations. If they are storing the value in a database, then they need to make sure that the target column is appropriately sized. In reality those two best practices are skipped surprisingly often. The issue is worse for users of typeless languages or schema generating ORMs.

This example is not unique. One of the most commonly abused primitive types are strings. Strings can pretty much hold anything and will most commonly only be bound by length. For a while there in the 90's and early 2000's people were all about numeric IDs. Auto-incrementing primary keys dominated relational table designs. When the number of transactions services were handling started jumping people started thinking more and more about how to avoid locking on those counters during inserts and updates. They started thinking about what might happen if the ID space became exhausted. Schemes emerged for using those ID spaces sparsely and recycling IDs that were lost in gaps.

At some point people discovered GIDs and UUIDs. They were recognized as cool tools that might even save a CRUD application a round trip to the database. People started defining all of their IDs strings. The problem was that quite frequently interfaces that defined string IDs were “initially” implemented with a numeric ID on the backend. You might guess what happened next.

API consumers hit the service a few times and noted that the ID always came back as a number. Or maybe they used a database schema generator from a typeless language. Or maybe the API was foolishly vending ordered numeric IDs (which also became part of the implicit contract) and consumers were parsing numerics from the strings for comparison. Whatever the reason strings are tricky to work with in service interfaces.

I get it though. Wide types are attractive for a quite a few reasons. Ambition and “future proofing” seem to be the most common. Wide types also appeal to certain “keep it simple stupid” or “lazy engineer” mindsets. Using narrower types requires deep insight into the product you’re building. Making sure it is future proof takes specific engineering efforts and insight into potential iteration and migration workflows. Picking a wider type lets you move on quickly. There is one important thing to remember…

Unless you’re a lawyer who is fine with disrupting the business of your API’s consumers: the observed range of values handled by your service over time become part of your service’s implicit contract.

What is the best practice?

Use the full domain of your interface types. If you vend an int64 you should be using the full domain. Unless you want consumers to depend on incrementing numerics in some identity field avoid using “relative” numerics. If you’re using a string you should be vending values that contain at least parts of every major character class. That means numbers, letters (of mixed case), and every symbol you can think of.

If this sounds like a pain you could always just use smaller types. Maybe use an int8 or int16 when you’re talking about number of service replicas? It really depends on your use-case but the point is to turn input validation into a domain problem as often as possible (kinda like containers turn access control problems into domain problems).

The Bad News

The bad news is that once you “widen” the valid domain of service types in a published API it is particularly difficult to narrow them. Doing so typically requires versioning the whole API which in a SaaS scenario means running two versions of the service. If you can get away with narrowing something then you should do so as quickly as possible. Do it before you take on that one consumer who simply cannot migrate and cannot be broken. Do it before you earn yourself a SaaS prison sentence.