Ten years ago, the idea of DevOps sprang from the minds of Andrew Shafer and Patrick Debois (source). A year later they actually named it “DevOps”. This tenth anniversary marks a good time to look at where DevOps has taken us as an industry and where we are heading.
This write up is not about vendors, commercial, open source or otherwise. At least not exclusively. This post looks back at the history of DevOps and CI/CD practices and digs into what patterns have evolved over the years and what patterns we see coming over the horizon.
At the Agile 2008 conference in Toronto, Andrew Shafer did a talk that became the seed for the whole DevOps movement. This is now “ancient history” in the tech world but it’s worth reminding ourselves that DevOps is an idea that was sprouted by, or at least in the context of, the Agile movement.
Now, the 2001 Agile Manifesto is a bit light on specifics but it is pretty clear on…
It is not a surprise then that DevOps and the practice of CI/CD struck a chord with the Agile people.
The principal idea that caught the imagination of almost everyone in IT and related professions was the idea of codifying not just business logic, but also the supporting infrastructure around it.
Without codification, the other big promises of DevOps and Agile are pies in the sky. Codification has two main applications:
Reduced lead times. Testing an application, setting up a server or installing an application can be all pretty time consuming activities. Abstracting the process into code and re-using that code is almost always a net gain in productivity and drop in lead times.
Risk aversion. Even if you are not the quickest and Agile of the bunch, codifying helps address the inherent danger of “changing things” and amps up the predictability of releasing quality software.
Not a goal, but certainly a huge side benefit is that code creates a shared language, platform and understanding between developers and sys admins. This eponymous 2009 presentation by John Allspaw and Paul Hammond from Flickr catches that spirit completely
From “codifying all the things” sprang forth an entire industry of products and services, hosted, SaaS, open source, commercial and everything in between: Heroku, Cloud Foundry, AWS Beanstalk ,TravisCI, Jenkins, CodeShip, Bamboo, Puppet, Ansible, Terraform. The list goes on and on.
The first wave of DevOps practices ticked the “continuous”, “frequent” and “working software” boxes of the Agile manifesto. However, it turns out that that other box, “harnessing change … as a competitive advantage”, is a bit of an enigma. What changes are actually to our advantage? How do we know when they are not? What do customers actually want?
These types of questions have always been the domain of marketing, UX and product management people. A/B and multi-variate testing, test audiences and short-lived experimental features are typical weapons found in the field. The base assumption is “we don’t know the answers. Let’s experiment and find out.”
This experimentation mindset is now making its way into the domain of the back end. And it was high time. Software engineers, architects and sys admins are by nature more predictors than adapters. We predict the future and build our code base and infrastructure to get that future job done as best as possible. We call it “specification” or “estimation.”
There’s a snag though. Raise your hand if you’ve done these types of predictions and it turned out your we’re wildly off target after things went live. ✋
Ditching predictions and estimates can be a profoundly powerful thing. But the story doesn’t end there of course. Over the last few years many patterns, techniques, tools and services have popped up to assist engineers in turning “I don’t know” into “I have learned. I know now”.
The granddaddy of them all, and by far the most crude but wide spread pattern is the blue/green deployment. Arguably, many engineers implemented this pattern long before DevOps saw the light of day.
source: https://martinfowler.com/bliki/BlueGreenDeployment.html
A typical blue/green deployment scenario involves two identical environments and some form of switch to route traffic to either environment. Come release time, you deploy to one environment and direct traffic at it. You keep the other environment for fallback in case things go south.
A lot less crude than blue/green deployment is canary releasing. The principle is the same, but the granularity is a lot higher. With canary releasing you can actually start running proper experiments without putting your production environment at risk.
source: https://martinfowler.com/bliki/CanaryRelease.html
Your routing component needs to be “smart” to do canary releasing because you want to target only a sliver of your total traffic and direct that traffic to the new version of your app.
Canary releasing is still very much a pattern. The implementation and execution is left as an exercise to the reader. There are however products that already offer this as a first class citizen. With https://vamp.io we allow targeting on many different traffic aspects like user agents, devices, geo location etc. and have support built in to the UI. AWS offers canary releasing in their API Gateway services
AWS Api Gateway Canary Releasing
A/B testing is canary releasing with a statistical framework around it. By and large it offers similar targeting options as canary releasing but it adds goals to the mix. This helps determining if and how much a different or new version of your app contributes to a specific goal.
source: https://www.optimizely.com/optimization-glossary/ab-testing/
A/B testing is offered by vendors like Optimizely, Visual Website Optimizer (VWO), Google Optimize and is implemented as feature in many other services like Unbounce, Mailchimp and Kissmetrics. Companies also build it themselves, like Pinterest and Booking.
However, all of these services and implementations work strictly on the front end (web pages, email etc.). This means you can mostly test visual and textual variations of an existing application. What if you need to experiment on other parts of your application stack? Product and services are appearing in this arena.
Optimizely X Full Stack is a full service solution, covering many languages and run times. In the open source world we have Planout by Facebook, Wasabi by Intuit and Petri by Wix.
Feature toggling is more of a parallel pattern. It integrates aspects of canary releasing and A/B testing into a pattern focused on toggles. These toggles turn application features on and off. Either in general or for target audiences.
Feature toggling’s main virtue is that you can decouple the moment of deployment from the moment of release. Services offering feature toggling are for instance LaunchDarkly and Split.io. Open source alternative are Togglz for the Java platform and Flipper for Ruby.
Also know as “dark traffic”, shadow traffic is the practice of syphoning of traffic from the production flow and exposing it to an experimental version of your app, without the end user being directly involved.
This pattern is firmly rooted in the back end and can be used for load & performance testing or as a general sanity check. Implementation at this moment seems to be completely custom, as of the time of writing I could not find any SaaS service that offers shadow traffic. Open source, middleware-type projects like Istio (for the Kubernetes stack) and GoReplay are relatively new.
All implementations need to overcome some thorny issues with how network traffic works at the lowest level with regard to client responses, encryption etc. However, the potential specifically as part of an integrated platform like Kubernetes is high.
As experimentation patterns are adopted across the full IT stack, the granularity of experiments and scale at which they are deployed rise. This gives way to a whole set of new problems.
Many of these problems are automation problems and fall within the scope of DevOps practices. We can see the first products and services appearing that aim to tackle these issues.
Using machine learning and artificial intelligence to make sense of the deluge of logs and metrics a typical modern IT stack generates is still in its infancy, but looks really promising.
Pattern recognition, filtering and predictive analysis are all well understood mathematical problems. We can see these techniques appear in DevOps oriented products and services like Signifai and Logz. Both services promises to cut down the noise inherent in data streams and help teams focus on finding the nuggets of gold buried among the scraps.
ElasticSearch, a popular open source search engine used by many to analyse log data, just added machine learning to its feature set. This gives users a relatively easy path into using machine learning to grog their data.
predictive analysis in ElasticSearch ML, source: https://www.elastic.co/guide/en/x-pack/current/ml-overview.html
Leveraging ML and AI to assist in decision making with regard to deployments, scaling, feature toggling and running experiments in general is still largely unexplored terrain.
Modern container platforms like Kubernetes, Docker Swarm and Mesosphere’s DC/OS fit neatly into the new and emerging patterns of DevOps. Dubbed “Cloud Native”, these platform provide portability, scalability and rapid, standardized deployment patterns.
In this context, they are the ideal base infrastructure for running an experiment based IT infrastructure as they tick almost all of the boxes needed for such a system:
OCI, CSI and CNI aim to standardise the container, storage and network layers of cloud native platforms.
The move to cloud native infrastructures is already happening and already we can see new applications enabled by the cloud native paradigm appear.
Products and frameworks like Istio and Vamp take smart routing to a level where routing decisions become real time malleable application attributes. Coupled with a monitoring solution, smart routing solutions are becoming the back bone of A/B testing and shadow traffic patterns implemented on DC/OS and Kubernetes.
Bin packing is a technique where many application processes share a physical host right up to its capacity as to not let unused capacity go to waste. Both Kubernetes and DC/OS already do this.
Combining this with something like AWS Spot instances takes it to another level of cost efficiency. Interesting advances can be made here by tying that cost level to higher level business goals set, for example, in an A/B test. Many companies have internal cost accounting and attribution to structure their IT cost. This could tremendously help in making the business case for an experimental application or service.
Want to increase innovation? Lower the cost of failure
— Joi Ito, Director MIT Media Lab
If you liked this article, please show your appreciation by clapping 👏 below!
Tim is a product advocate for https://vamp.io, the smart & stress free application releasing for modern cloud platforms.