This is a technical sequel to an
To understand why traditional Infrastructure as Code is no longer adequate for the new serverless world, we need to return to the very basics. Since the topic is vast and the amount of available material, tools, and solutions is large, opinions differ widely on exactly what IaC includes and where it is headed. First, we need to agree on a deductive methodology and structural analysis of the technology available today then proceed to the most plausible scenarios for future evolvement — and the needs or constraints emerging from it.
In this publication, I intend to come up with a coherent and not-so-complicated train of thought suitable to serve as a conceptual foundation for the practical work ahead. The article’s flow is as follows:
Technology never stays still. Whenever something new and useful is introduced, it shortly thereafter embarks on the process of commoditization. More and more people learn to do it with higher efficiency. Commoditized and standardized technological capabilities beget new technology innovations [1].
Technology and practices co-evolve [2]. New technology renders some practices obsolete (muleteers are less in demand today) and creates a need for others (truck drivers are essential, at least until now). While randomness and sheer lack play a significant role in the co-evolution of technology and practices, it is unlikely for things to arbitrarily jump from one point to another. Some things are more likely to happen than others. Using a bit more formal language, we may argue that technology’s and practices’ co-evolution constitutes a Semantic Spacetime [3] fabric.
The DevOpsFinSec concept is often confused with writing and maintaining automation scripts for infrastructure resource allocation. At times, it is also linked to monitoring. At least, this is what is meant in the hiring advertisements for DevOps Engineers.
This is a narrow view of the subject matter and contradicts the original definition of DevOps. As explained by Patrick Debois, who coined the term, DevOps is “to bridge the gap between developers, operations and sysadmins in an Agile way.”
I think this diagram [4] best explains the whole concept of DevOps (Recommended: print this on A3 and stick it on your board in front of you). In this model of DevOps, there are four key areas of interaction (knowledge exchange and feedback) between dev and ops.
If we dig a bit deeper into Patrick’s article, we find another interesting diagram that describes the layers at which the interaction happens.
What do we learn from these two diagrams? That DevOps (or more expansively DevOpsFinSec) approach assumes four types of cooperation between the disciplines involved in delivering software services to end users: two of which originate from development and two from operations. Each area requires tools for automation, processes (preferably lite ones), and people with an open mindset.
Here, I want to make it abundantly clear that without human cooperation across multiple disciplines, no tool or process can help.
Disorder + Automation = Automated Disorder
Now, Infrastructure as Code (IaC) can be defined more accurately as a set of tools for automated infrastructure resource allocation (Area I: Extend delivery to production; Area 3: Embed project knowledge into operation). Notice, by itself, IaC does not talk about observability (Area 2: Extend operations feedback to project) and automatic adjustment (Area 3: Embed operations knowledge into the project). Though, it does take part in instrumentation for monitoring or alerting and implicitly takes into account the operation specifics and requirements in automation scripts.
Although these distinctions are widely adopted for multiple reasons, they are far from precise. In 1993, Mark Burgess created the pioneering
To sum up this preliminary analysis, IaC is about creating executable documents, which describe computer resource allocation and resource interconnection aiming at a particular goal.
As mentioned above, technology, tools, and practices co-evolve, but they do not move completely arbitrary. Some forces determine the probability of the move. Some things are easier than others, and like water, things move towards the lowest possible energy expenditure.
In the context of automation in general and IaC in particular, the key driving forces are:
These factors influence each other. As spacetime and matter influence each other (thank you, Mr. Einstein), evolution and innovation, capabilities and needs beget each other.
Let’s see how this dynamic works in the IaC space.
In the beginning, in the 1990s, the number of computers per typical data center was small, networks were still relatively slow and unreliable, only scale-up preferences prevailed, the number of applications and services was not too big, and the updates were released relatively infrequently.
In such an environment, manual cobbling together of the components by skillful system engineers was a norm. I started working for such a company in 1995. The company management was convinced that paying a high salary to a bunch of experienced system engineers would cost less than developing a highly automated delivery system which would distract the best talents from the company’s core business (in that case, conditional access to Pay TV). Of course, these system engineers only used some basic installation scripts and nothing more.
On the other hand, a trend emerged with the new generation of web applications and services (Google, eBay, Amazon, and Facebook to name a few) where companies preferred scaling out, needed a larger number of computers and software components to manage and a higher frequency of updates. For them, the smart system engineering cowboys pattern did not work. Still, managing physical compute, storage, network boxes on an organization’s data centers was a norm.
In 1993,
managing physical boxes (computers, storage disks, network switches, and routers)
installing, patching, and upgrading basic system software (operating systems, database, or messaging servers)
installing, patching, upgrading, and removing applications and custom services
Following
Since the whole process, although fully automated, was slow, expensive, and error-prone, the preference was to keep infrastructure resource allocation automation relatively decoupled from application development. For that purpose, a special term service density was coined to measure how many services or applications could run on the same underlying infrastructure.
With the introduction of virtualization (VMWare, etc.) at data centers, it was suddenly possible to significantly improve service density by reallocating physical boxes to different software needs. This publication is not a treatise on software industry history, so I won’t get into further details here. What is important to notice is that virtualization was a kind of lifting-and-shifting of bare-metal data centers into virtualized world. Conceptually, although, very few things changed.
With advances in Internet technology, renting out virtualized environments to be accessed over a network was the next logical step. It started from AWS S3 Storage and quickly expanded to other areas of compute and networking. Indeed, in many cases, it didn’t matter where is this virtualized environment physically located (pun intended). IaaS offerings promised less and less headache with managing physical boxes on-prem. But conceptually, IaC tools remained at the same level, lifting and shifting bare-metal data center configurations into the virtualized world.
Ironically, AWS S3 was probably the first cloud service, and from the very beginning, it was fully managed and serverless. Yet, with the introduction of the AWS EC2 service, the trend took a different route.
Once we learned how to allocate basic virtual machines, attach them to storage devices and connect them via a network, it was quite natural to package a cluster of such resources running, say a database, and offer this resource bundle as a new service.
To support the new Platform as a Service offerings, existing IaC tools were extended to allow the specification of such a bundle. So, along with the specification of a virtual machine auto-scaling group, it was possible to ask for a MySQL database cluster. This packaging of high-level and low-level resources required was conceptually, still lifting-and-shifting bare-metal data center to the virtualized world.
With cloud-native architecture pioneered by Netflix and advances in container technology, such a simplified picture started to be less and less conforming to the original architectures and practices. Yet, very little changed at the conceptual level of IaC tools.
And then came the serverless paradigm where developers did not have to bother about capacity. How did we get there? Quite simply, like in the physical world, as with technology evolution, proximity determines possible moves at any point in time. Autoscaling was prevalent in cloud computing services from the very beginning. It meant when the server load gets to a certain level, spin out another server; when it drops below a threshold level, kill one. But at the beginning, cloud users had to manually specify these numbers, and more often than not, they had no clue what these thresholds should be. It was only a matter of time before cloud vendors asked themselves if they could guess these numbers from the vast usage statistics they had collected over time.
A simple answer to the question,
Originally, IaC solutions were about creating executable documents capturing configuration of general-purpose infrastructure resources (physical or virtual compute, network, storage boxes) to serve an application(s) needs. Later, it was extended to support packaged platform services such as database clusters. But, conceptually, it remained the same.
Since the whole process, even when fully automated, was slow, expensive, and error-prone, the preference was to keep infrastructure resources allocation automation relatively decoupled from application development. (I intentionally repeat this sentence).
But, what if there are no low-level infrastructure resources like VM instances, disks, clusters, or routers to worry about anymore? True, you still need to allocate serverless or fully managed resources, but it is qualitatively different.
Are the same tools and practices still a perfect fit for the new environment? As S. Wardley argues in the article mentioned above, the answer is ‘No’. In a new environment, we need to think afresh and act anew. Indeed,
Manually crafting IaC templates is a tedious and error-prone process. For even a modest serverless cloud application, its IaC template can easily get to hundreds of lines of JSON or YAML. Another challenge is that the configurations can be confusing like, what is an AWS IAM Role, how is it different from IAM Policy, and how are these relevant if the application only needs to calculate the total number of vacations per employee per year?
Things do not stop here, though. While it is always critical to implement correct domain logic, for a real-time production deployment, there are additional concerns like if the APIs are protected against unauthorized access or DDoS attack, if the database is exposed to the internet, if the data on move and data on rest is properly encrypted, and so on. All of these concerns can be dealt with the proper allocation of cloud resources. Another significant concern is that the resource requirements vary between development, test, and staging environments.
For a development environment, allocating all the resources can be a costly affair. Similarly, for testing and staging environments, we may need only a few of the resources. If IaC templates are crafted manually, it is likely that it will be a one-size-fits-all kind of template which may be inefficient and can lead to wastage of resources. Alternatively, it can morph into a generic, overcomplicated, and parametrized structure. So, despite the serverless breakthrough, we are still at square zero of DevOpsFinSec complexity dating from the previous century.
To sum up, the modern IaC stack is a patchwork reflecting the evolution of virtualization and cloud technologies incorporating at least three different concepts: traditional Infrastructure as a Service, elements of Platform as a Service, and Serverless Cloud.
There have been some attempts to alleviate the pain, but I would argue that none of them brings a satisfactory solution because I believe the root cause is yet to be analyzed. To understand the challenges with the IaC stack and what it means in practice, let’s look at Cloud IaC ingredients.
To understand how to ideally implement the automatic allocation of serverless cloud application resources, we need to map existing components and their relationships. For this article, I will use a combination of AWS and Python back-end ecosystems. The real landscape map is too complex to be presented in one publication.
Let’s first look at AWS Serverless Ecosystem:
At the bottom of the stack, there is an ever-growing list of AWS Cloud Services (over 200 at the moment). Each service exposes a
Some cloud services, managed or not, have a 3rd party drop-in replacement (e.g.
All cloud orchestration tools derive their basic concepts from
Also, it is important to notice that this cloud orchestration service does not distinguish between fully managed, partially managed, or unmanaged services. The same language is used for all the services and one template can contain an arbitrary mix of different types of services.
Originally conceived pure declarative style never worked well. So,
With regards to serverless,
Cloud orchestration services, such as AWS CloudFormation, are not the only way to access cloud services. Cloud service REST APIs are wrapped in language-specific SDK libraries. For accessing AWS services using Python, there are two types of interfaces: a low-level interface,
As with the cloud orchestration service mentioned above, SDKs make conditional promises depending on the original REST APIs and can introduce a lag, usually small. Also, the capabilities vary with languages. E.g., Python and Go SDK capabilities may or may not be identical.
Needless to say, a cloud orchestration service has its own
Other than cloud SDK, there are two additional ways to interact with the cloud services: via a Command Line Interface, such as
IaC concept has its own challenges and problem areas. There have been many fraweworks and tools that have tried to reduce or address the IaC challenges to some extent.
Somebody within AWS realized quite early that the
It was a nice try and many people used it, including yours sincerely, at least at the beginning. But IMHO, it did not and could not proceed too far. If
As a typical American business, AWS is a large organization where different people keep trying their ideas. It looks like somebody within AWS (or they just acquired a startup) realized that irrespective of the tricks applied to JSON/YAML, it would never be as powerful as a mainstream programming language. Somebody came up with a strong preference to embed CloudFormation
Initially, it was available in
And indeed, in terms of flexibility, modularity, and test automation
My biggest problem with
AWS Chalice was a half-step forward from IaC.
If all we want to do is to develop a bunch of
As said above, it was a big half-step forward (pun intended). Why half-step? Because it swept a lot of traditional IaC garbage under a carpet of separate
Personally, I’m not in favour of using Python Function Decorators for protocol and event trigger routing [11]. In my view, this obscures too much business logic and brings up many irrelevant low-level details of, say, HTTP protocol.
Vendor/platform locking is a more serious question to be addressed. If we start talking about IfC, should we also strive towards cloud platform neutrality, such that instead of s3_trigger
we should talk about a cloud_storage_object_created trigger
regardless of whether it is
AWS is not alone in its attempts to address cloud IaC challenges. Many 3rd party players, commercial or not, developed complementary or alternative solutions on top of
A few of these 3rd party vendors are listed below (look at references below for even more). Detailed analysis of each of them is far beyond the scope of this publication. I will bring in some general observations here:
Based on this brief analysis, it would be safe to conclude that none of the existing 3rd party serverless frameworks provide an adequate solution to implement Infrastructure from Code to its full extent.
Infrastructure from Code (IfC) requires changes in the existing technologies by the cloud vendors, programming language vendors, and tools & library vendors. Let’s look at the wish list for each of them one by one.
This is not directly related to converting plain application code (in a programming language) into serverless cloud resources specifications. However, if implemented one day, it would dramatically simplify the whole task.
At the moment, no mainstream programming language provides a cloud-native run-time environment. New languages, such as
db_connection
object with This topic fails somewhere between cloud platform, compiler, tools, and libraries vendors. To maintain cloud neutrality, there is a need to use standard interfaces as much as possible. For example, instead of relying on send
function and let the service specify whether it's a Queue, PubSub, Email or any composition of them? Such specification would be completely cloud-neutral and translatable into underlying cloud resources specification with a modest programming effort.
Another example would be translating standard data structure APIs, such as
Lastly, what about access to cloud resources via standard interfaces? For example, why do I need to use
Who should implement such mappings is unclear at the moment. Ideally, cloud service vendors in cooperation with language compiler vendors should take care of it. In reality, it would be a community-driven open project bringing together multiple players who think this is the right thing to do to boost cloud application development productivity and quality.
In the serverless and Infrastructure from Code world, it gets progressively harder to justify traditional Integrated Development Environment systems running on local computers or cloud virtual machines. Indeed, modern browsers are powerful enough to run a decent editor inside, while it should be possible to build a serverless backend for storage, computing, IntelliSense, etc. Why should I clone it from git to my local disk to later upload it to cloud storage? Why can’t cloud storage do the whole job? Why can’t I run my unit or integrated tests in Cloud Function? Why
A short answer to these questions is: “it is all feasible and within reach too.” We do not have the tools yet, although, they were conceptually demonstrated in 1968 at
Recently, a number of solutions have come up which are trying to promote Infrastructure from Code concepts, at times under different names.
The serverless world is a fast-moving space where Infrastructure from Code is the next advance that can make it move even faster. The history and evolution of cloud technologies and the challenges with the current technologies presented above clearly indicate the likelihood of maturation of IfC in the coming years. IfC has the potential to optimize and bring much-needed gains in agility to cloud applications.
The author, Asher Sterkin, is GM at BST LABS. BST LABS is breaking the cloud barrier — making it easier for organizations to realize the full potential of cloud computing through a range of open source and commercial offerings. We are best known for