It’s always interesting to try to debug a software problem you run into in daily life and see whether you can figure out the root cause. I recently ran into a problem with insurance coverage (who hasn’t?) that seemed to arise from a pretty classic software design issue. I think the example gives insight into how easily things can get complicated when software models the real world.
If you studied computer science in school, one thing that might not have appeared obvious as you first learned about sorted arrays or binary search trees, hash tables or any other interesting data structure is the striking absence of semantics from these descriptions. These data structures exist in isolation, independent of any real world meaning. It is only when you try to apply them to a particular real problem that you start having to deal with the messy issue of mapping semantics and deciding what these data structures really mean in the context of the actual real world problem you are trying to solve.
Let’s walk through a simple example. Let’s say I have a water faucet. I model the faucet using a single bit: zero means off, one means on. (Let’s ignore that some applications might want to model it using a continuous variable reflecting how open it is.)
I now attach a simple Y-valve to this faucet. This allows me to split the flow of water and independently control the left and right flows. It seems obvious I can model these with a single additional bit for each branch, again zero indicating off, one indicating on.
The overall system can be modeled in a direct and obvious way with three bits, one for each valve, main, left and right. Pretty simple, huh?
Things start to get complicated when I want to perturb the system. I could require that any modification fully specify the entire new state of the system. That seems simple here (there are only 8 possible states after all, 2³). But that is clearly infeasible for more complicated systems and even unlikely for something as simple as this application.
Consider if the water is flowing through the left side only. I must be in state [Main: 1, Left: 1, Right: 0] or more succinctly, [1, 1, 0]. I then request to turn the flowing water off. That high-level goal could be accomplished by many different state transitions (to [1, 0, 0], [0, 0, 0], [0, 1, 0] and even perversely [1, 0, 1], which closes the left side but opens the right or [0, 1, 1] and [0, 0, 1], which closes the overall flow but then opens the right valve for no external purpose). So a seemingly simple request could leave the system in 6 of the 8 possible states (or maybe only 5 if I impose a sensible top-level business rule that any request directed at one flow should not alter the state of the other flow).
This trivial example already reflects some common characteristics of much more complex software systems.
And this is a simple 3-bit system with a direct and obvious physical mapping to the real world!
Microsoft Word had a bug/misfeature that I always felt demonstrated an isolated form of this complexity in a vastly more complicated system. In Word, paragraphs have styles associated with them that describe formatting properties like font size or margins and indenting. A bulleted list is represented as a paragraph with a list-type style associated with it. A separate independent property specifies the level of that paragraph in a multi-level list (e.g. the list-style might specify that the second level in the multi-level list uses square bullets rather than round bullets and a paragraph with list-level two would display those square bullets). Through some unknown manipulation, you might end up with a paragraph that specifies a non-list style (so it looks like a normal paragraph) but has a non-zero “list level”.
To the user this is completely innocuous because the list level is ignored for non-list-type styles. The odd behavior occurs when the user tries to turn the paragraph into a bulleted list. Suddenly the paragraph is reformatted as if it is sitting deep within some multi-level list. That anomalous list-level property was just sitting there waiting to be misinterpreted when the system was perturbed.
Application designers deal with these types of issues all the time and my goal here isn’t to give a set of recipes or design guidelines for avoiding or managing them. My main point is that this complexity is inherent to virtually any problem and grows rapidly as the state of the system accrues new interacting properties and features and new, usually ambiguous, operations to perturb those properties. As I talked about in The Math of Easy-to-Use, any time you increase the power or complexity of a system, you introduce ambiguity and overhead in how to interpret user intent within the context of that more complex system state.
OK, what about my insurance story? My son has just started his first post-college job and will be covered by his own insurance in a few days. I went into the health portal and scheduled to have him removed from my coverage. At this point, the portal displayed a somewhat confusing status. It reported him as “disenrolled”, but also reported an “End Date” for coverage of 12/31 meaning his coverage should still be valid through the end of the year.
As Murphy would have it, that evening he proceeded to break his ankle playing Ultimate frisbee. His work coverage hadn’t kicked in yet. When we went to schedule surgery, a query from the surgeon’s office to the insurance company indicated his coverage under my plan was “deactivated”. Apparently the “disenrolled” property was inappropriately overriding the ”End Date” property when checking coverage status. The interaction of these micro-states caused a macroscopic user-visible effect. It may be that this also required that we were in the open enrollment period at the same time. As in the Word example, some underlying micro-state ambiguity often requires an additional trigger to result in a user-visible anomaly.
The story ends well as we were able to get the issue resolved (and the surgery looks to be successful). My frustration at spending an hour or two on the phone addressing the problem was slightly mitigated by speculations about the underlying cause and how rooted it probably was in these very basic semantic modeling challenges. Life is complicated and software is too.