687 reads

Conceptual Compression and Deeper Patterns

by Terry CrowleyJune 2nd, 2018

Too Long; Didn't Read

This <a href="https://m.signalvnoise.com/conceptual-compression-means-beginners-dont-need-to-know-sql-hallelujah-661c1eaed983" target="_blank">post</a> by <a href="https://m.signalvnoise.com/@dhh" target="_blank">@DHH</a> of Basecamp and Ruby-on-Rails fame sparked a bunch of discussion on the web and some thinking on my part. The author argues that various storage abstractions are now powerful enough that many developers of database-backed applications can be successful and effective without understanding the details of SQL syntax or precisely how the storage system works. He gives as a concrete example Basecamp 3 which serves millions of people and yet has no fully formed SQL statement in its entire code base and instead leans on the “Active Record” abstraction exposed by Rails. Active Record is an example of an object-relational mapper that allows a programmer to deal with objects in their native programming language that are then persisted (relatively) transparently to the underlying storage system.

Companies Mentioned

featured image - Conceptual Compression and Deeper Patterns

This post by @DHH of Basecamp and Ruby-on-Rails fame sparked a bunch of discussion on the web and some thinking on my part. The author argues that various storage abstractions are now powerful enough that many developers of database-backed applications can be successful and effective without understanding the details of SQL syntax or precisely how the storage system works. He gives as a concrete example Basecamp 3 which serves millions of people and yet has no fully formed SQL statement in its entire code base and instead leans on the “Active Record” abstraction exposed by Rails. Active Record is an example of an object-relational mapper that allows a programmer to deal with objects in their native programming language that are then persisted (relatively) transparently to the underlying storage system.

He uses this as an example of “Conceptual Compression” — where a programmer no longer needs to know the messy details of some particular technology to make effective use of the capabilities it provides. The consequence of building on these more powerful — and perhaps simpler — abstractions is both that a programmer can be more productive and that programming in general can be opened up to a wider class of developers. There is no longer as much arcane knowledge and secret sauce needed to be effective and to get real work done. This opens up programming to experts in the application domain rather than only experts in programming.

It took me a while to unpack why I found the article so irritating.

This whole process of creating a layer, component or framework that encapsulates a more basic capability is an incredibly common pattern. User interface and storage frameworks are some of the most common examples but they really occur anywhere you look in software. The challenges that arise over time are also very common which is what makes this feel less like the breakthrough he touts in the blog post and more like a continuous cycle.

A lower level system often provides a great deal of flexibility — and with flexibility comes complexity and challenging choices for a user of that underlying technology. A layering framework can simplify by recognizing that many applications follow common patterns. By making choices, the framework simplifies the development of the class of applications that matches that pattern.

In the richest examples, a simplifying framework creates its own ecosystem of additional components. For example, in the NodeJS world, the Express framework provides an easy way to get a web application up and running and also defines a plug-in middleware mechanism that allows lots of additional functionality to be incorporated in a very simple way.

Microsoft Foundation Classes (to pick an aging example) provided a C++ object-oriented idiom over the underlying Windows UI components. Functionally it made it easier to build C++ based applications because it allowed a graceful marriage of those C++ idioms over the more basic APIs and message passing of the Windows environment.

There are a variety of challenges in any layering approach — I talked about them in Leaky by Design.

The first is that the layering is not complete. Although the layer attempts to simplify, inevitably some of the underlying complexity leaks through. The programmer is told they are dealing with a simple system but then problems arise that come from the constraints and limitations (or perversely, the flexibility) of the underlying system. If the developer has not built up a mental model of what those underlying characteristics are, they have a difficult time recognizing the source of these leaks and challenges they introduce, and possible solutions as their application grows more complex.

Alternatively, the layering is too complete. The application runs into challenges because capabilities that are available in the lower level system are obscured or interact poorly with assumptions made by the higher level framework. So even though some solution should be possible, it is blocked by the layer and the application needs to do somersaults to address the problems. Often these challenges are performance related. Since performance is eternally changing due to the rapid evolution of the underlying hardware environment, simplifications that were unacceptable or overly constraining for one generation become acceptable for another generation. Using “X” is crazy! Until it’s not — but because some underlying technology has changed, not because the original evaluation was wrong.

An additional layering problem is that the layer becomes overly functional. This is really a combination of an organizational and a technical problem. The layer becomes a locus of innovation on its own with a team or community invested in its success. So a capability that really should be built into the lowest level gets constructed as part of the layering component. Over time this turns an organizational/ownership problem into a technical problem as new capabilities and requirements are not built and propagated orthogonally through the system. The makes the overall system look baroque and ad-hoc. This translates into unapproachable and hard to learn.

The other challenge I had with the premise of the post is that often the most interesting and important part of “compression” isn’t what has been removed (in this case a lot of details around SQL syntax and administration) but rather what remains. For any storage system, what always remains is the importance of locality.

Without looking at a line of code, I can guarantee that embedded deeply in BaseCamp’s design and its use of the Active Record abstraction is a rich understanding of locality and its importance in delivering good performance. In fact the challenge in almost every non-trivial application design is trying to balance locality and complexity.

The principle of normalization is enshrined in database schema design as some deep insight. But the intuition is obvious to the greenest programmer (or file clerk). Don’t store the same information twice. The challenge is that normalization is almost the antithesis of locality. The reason you were lead to store it twice is because you needed it in more than one place (classically, the customer list and the invoice). So the developer denormalizes and adds “business logic” (rather than schema design) to ensure semantic consistency. Over time the business logic layer grows more complex and maintaining consistency becomes harder.

These problems don’t go away with a simpler storage layer. Independent of whether a programmer understands the syntax of a SELECT statement or how to define a database view, they better understand locality. There are deeper patterns to these technologies that will not compress away.