Containers are Hard

Containers are hard.

Containers are all over our tech devices. Files in a file system. Tabs in a browser. Windows in a desktop shell. Applications on a smart phone. Objects inside a compound document. (I’ll skip extending this discussion to the common use of the word “containers” in the context of cloud services but no doubt many of the dynamics discussed here play out in that context as well.)

What’s so hard about containers? The biggest problems arise because we have two opposing dynamics. We would like to keep the container-object interface as simple as possible. The container knows little about the objects it contains and the objects know little about the container. Knowledge implies complexity, tighter binding, more constraints on future development and greater development effort on both sides of the interface. The more knowledge objects have about the container, the harder it is to evolve the container forward — that knowledge represents constraints. More knowledge also means it is generally harder to integrate new object types into a container — lots of “things to know” often map to required features or other constraints on the ways objects need to be implemented.

The opposing dynamic is that this tighter binding often represents better integration and a better overall user experience — at least in the short term — and more functionality. So the container-object interface starts simple and gets more and more complex over time as new features are added. Over time this tighter binding makes it harder to create new object types and harder to evolve the container.

I have mostly struggled with these issues in the context of desktop systems and document applications but the motivation for this post came when I bought a new Audi A3 with Apple CarPlay. The A3 comes with a small pop-up screen on the dash that displays Audi’s default shell interface and gives you access to car settings, navigation, maps, radio, telephone, etc. There is one big clickable round knob for navigating around and selecting something and then a range of other dedicated hardware buttons to drill directly down into specific functions — radio, map, telephone, top-level menu and a “back” button.

When you plug your iPhone in, Apple CarPlay becomes one of the top-level choices in the Audi menu. When you select CarPlay, it takes over the entire screen and presents an interface that looks like the home screen of your iPhone, but only shows those applications (mostly Apple’s own) that explicitly support a CarPlay interface. These include maps and a phone UI that overlap the functionality of Audi’s components.

You immediately run into a pretty classic container problem — “who’s on top”? Both top-level interfaces want to be on top. They each provide a way to drill into the other container but no other way to expose capabilities from one to the other. If I try to use the hardware buttons, e.g. to drill directly to the telephone interface, I get told that the telephone interface is unavailable (rather than plumbing me through to CarPlay’s telephone interface). The “map” button gets me to Audi’s (inferior) map component rather than the map component in Apple CarPlay. The “back” button works within either container but does not cross containers. So if I select the radio with the hardware button while in CarPlay, I cannot get back to the CarPlay interface without navigating back to Audi’s top-level menu and selecting CarPlay from there. (This scenario arises trivially when I want to interact with the radio while using the Apple navigation feature.)

Each of these issues are relatively minor and you can certainly imagine extending the interface between Audi’s container and CarPlay to support these features. Each of these new features (like drilling through with hardware buttons) involves additional work on both sides of the interface and a tighter binding between the systems. In some cases this binding is further complicated because it introduces ambiguity in the interface. For example it seems clear that drilling through to Apple’s phone interface is better than saying “phone not available” but a user might want the choice of which mapping interface to support when pressing the map button. As Audi adds better support for CarPlay it needs to ensure that its own new feature development doesn’t degrade that integration (this type of problem is very common especially if the internal group thinks that it is in some sense “competing” with the other container or treats integration as a post-development verification phase rather than a real ongoing constraint on development and feature selection).

Let’s look at another example — window systems and the evolving relationship between applications and the containing window manager.

This offers a good case study since it has played out over 30 years — many of the issues associated with these opposing dynamics only arise over time because they are often about how current decisions create future constraints. I’ll look at Windows in particular but similar issues played out on other OS’s.

(This story is a significant simplification of the full history.) Windows defined a top-level application window and applications could implement a “multiple document interface” (MDI) if they wanted a single application process to support multiple documents — e.g. Excel allowing you to have multiple worksheets open at once. Library routines supported standard ways of interacting with these child windows but applications could relatively easily support their own separate features — e.g. Excel supporting multiple windows on a single document and keeping the contents of the windows aligned or Word supporting side-by-side windows when doing document compare.

Later versions of Windows then moved to extend and promote support for “single document interface” (SDI) where a single application could maintain multiple separate top-level rather than child windows. SDI support came with features in the task bar and other window management functionality to better support applications with multiple windows. The disadvantage was both the work required in each application to adapt to this change as well as work to design how app-specific functionality (like Excel’s side-by-side linked scrolling) could be retrofitted in this new design. The work was complicated enough that it was a decade before Excel fully integrated with this new user experience. Excel implemented “fake SDI” (with specific OS support for this mode) that “sort-of” worked for some usage modes but was totally busted for others.

The work to adapt to the new Windows 8 and WinRT interfaces again broke the user interface for applications with multiple documents that wanted to code to these new APIs. Windows was initially focused on applications that did not need this type of support but knew that they would need to add it in the future. They did not want Office going off and designing application-specific solutions that would later need to be retrofitted or constrain what the OS could do. Office agreed this was an OS (container) responsibility but disagreed with the prioritization. It was a long-standing source of friction but ultimately was of limited impact because of the overall poor reception of Windows 8 and the limited availability of devices running this interface.

Browsers also innovated in this space with the addition of explicit tabs for switching between open pages. Initially tabs were a completely internal application feature but over time the OS shell added support for directly activating an open tab. The latest release of Windows is adding direct OS support for tabs to both expand the supported scenarios (e.g. different applications per tab) as well as ensure the feature is provided in a consistent way within applications. This required yet another significant development effort within the Office applications in order to integrate smoothly. (Or not — I haven’t actually used the new feature yet so can’t speak to how well the feature and application integration works. See this post for more details.)

The general pattern here is that for simple and “vanilla” scenarios, adapting to continued container evolution is relatively straightforward. As scenarios become more complex and objects (applications in this case) integrate more tightly with the current design, evolving that support over time becomes more difficult. This tighter integration is initially welcomed by the container team as an example of a committed partner. Later generations of the same container team see this tight integration as sclerotic legacy — and generally the “fault” of the object team. (This is of the flavor “you screwed up, you trusted us”.)

OLE is another good test case. Object linking and embedding (OLE) was initially viewed as a major breakthrough in how to build compound documents. OLE 1.0 was fairly simple, with limited goals and limited integration. Essentially objects had a way of persisting their state into the document and a way of displaying their content on the document surface. OLE 2.0 was a frenzy of complexity with a hairy menu-merging mechanism to combine the UI of the host container and object as well as many additional APIs to more tightly integrate the container and object content. In-place activation and drag-and-drop support were a few of the features. That version of the container-object interface essentially crashed on the shoals of complexity from the start, with few new containers or objects that were willing to implement the full API. Despite the limited number of new object types in the ecosystem, the feature added ongoing overhead and container constraints for new feature development for all subsequent versions of the Office applications.

OLE 2.0 is a useful cautionary tale for another reason. The high-level scenario — “insert an editable Excel sheet in a Word document” was simple to state but required an explosion of underlying complexity. Within Microsoft, most pushback on feature ideas would come because of the amount of development time required, not because of the additional complexity introduced. These are often linked but not always.

In fact, ActiveX, supported in Internet Explorer as an extension mechanism for the web, was a significant simplification and trimming of the OLE APIs. It got non-trivial usage for internal line-of-business applications but the lack of a cross-platform solution or a secure distribution mechanism (essentially an app store) doomed it in the longer term.

I’m being a little unfair to OLE. There is no successful example of an object-embedding technology for editable documents. Just achieving sufficient baseline functionality to give the appearance of a consistent holistic document results in an astounding degree of complexity in the container-object interface. This dooms the effort early on. The battle between extensible compound documents and monolithic documents has been won by the monolith exactly because of these complexity challenges.

HTML has seen success with a very simple embedding approach using IFRAME’s, but this omits all the complexity of editing which would add an immense degree of additional feature requirements.

I have no “solution” for the tensions described here — they are really inherent to the structure of the design challenge. Knowing the tension exists can be helpful in balancing where you want to create tighter integration as you add new features and where you want to avoid that binding and the future constraints it implies — perhaps by saying “no” to some feature request or at the least recognizing that the bar should be high to OK these types of features. For any specific feature, there are often ways to minimize the assumptions made across this interface while still achieving the feature goals. That is obviously the goal for any API design but plays an even bigger role in problems of this form.