This article is in continuation to the “Democratization of Container Technologies” piece, in which I discussed Docker’s meteoric rise to fame.
In this article I’ll focus on the pain points & challenges that organizations face while trying to implement container based solutions and in the next article I’ll elucidate the best practices that should be followed for adopting containers with minimum friction and future readiness in mind. These insights are based on my experiences & learnings while working with multiple docker / kubernetes implementations. I am also an active part of the vibrant docker community, which has always stood up & been there to help as and when needed.
In order to better understand the challenges I have segregated these into 7 buckets. These buckets are not mutually exclusive and can be dependent or closely related to one another.
Skilled Resources:
One of the major challenges that organizations face is their ability to attract & retain the right talent. With the technology being only 5 years old & going through an unprecedented rate of adoption the demand outstrips supply by multiple folds. While developers would need to upskill themselves to build containerized apps, the real challenge lies in managing the operations. The skill sets needed to operate container based platforms are very different than what traditional operations professional possess. Very goods scripting & good coding skills are needed to effectively manage operations, plus a good idea of the application architecture, data flow and application logging is also a must. The complexity involved is very high as the applications are being packaged along with the dependencies and an operating system in a light weight container. Plus this container is running on an orchestrated engine which is designed to ensure that the containers adhere to the pre-defined rules set for them.
To cite an example, we had planned a full day recruitment drive for hiring engineers with hands-on kubernetes / docker skills and good understanding of the surrounding eco-system experience. Although we got a lots of well written resumes; but at the end of the day we were unable to find even a single technologist who understood the technology in-depth.
Fast evolving Technology eco-system:
The Docker ecosystem is evolving at an extremely fast pace. Being an open-source technology with immense potential has led to a deluge of third party tools & services. Tools to help developers write docker images; tools to deploy, configure and manage containerized workflows; tools to bring automation across various stages of the containerized app’s lifecycle; are emerging by the dozen every week. While this might sound like a great thing to happen in an emerging platforms journey, it makes it an extremely arduous task to shortlist the right set of technologies / tools to be used for different types of applications. Additional complexity is thrown into the mix when devising an upgrade strategy for these tools.
It also throws up the challenge of acquiring & retaining skilled resources with the knowhow of multiple upcoming technologies, as we discussed in point 1.
In one of my previous assignments, where we were containerizing an added value services application for a large telecom provider, we had a requirement for whitelisting some predefined IP’s from which the billing requests needed to originate for the transactions to be considered legal. The decision to go with ahead with a cloud hosted managed container platform was already taken, but what we realized that most cloud platforms at that point in time were not able to provide that functionality. We had to work with the cloud provider and come up with an innovative solution to get the setup working, this involved bringing in a new tool.
Organizational Inertia:
Implementing containers efficiently is a big shift from the traditional ways of working that organizations are used to; it touches all the aspects of an applications lifecycle across dev & ops. It requires rapid pace of innovation as well as empowering the developers & substantially reskilling the ops teams. DevOps principles have to be followed & automation in CI & CD has to be built to succeed in a containerized environment.
The existing ways of working, especially the change management process, access restrictions for developers & lack of product knowledge of operations teams; are big impediments in the adoption process. As an example a change management process that includes a weekly CAB meeting & multiple approval steps from various stakeholders would not support the velocity that is needed for containerized eco-systems. Similarly restricting developer access to just write & commit code also reduces the effectiveness & velocity of such implementations. Developers need to take responsibility for ensuring that any application logic they created runs in production and quickly fixing it if the need be.
While developing a containerized application for a large hospitality player, it was a challenge to get the containerized platform access for developers & the operations engineers found it extremely hard to debug the application issues.
Choice of Technologies:
With a rapid evolution of the eco-system and plethora of options, it becomes extremely difficult to choose the right set of technologies. There are way too many options available and unlike the Enterprises of the past; there are no outright leaders, a startup might have a better & more relevant container oriented tool than a technology giant.
To name a few moving parts within this jigsaw, there are multiple options for cloud based managed container solutions, cloud based & on premise managed kubernetes solutions, , self-managed Kubernetes solutions, persistent storage, orchestration, service discovery, networking, secrets management, tracing, monitoring and an unending list of technologies that have some unique selling points. And to top this list, better options are being constantly built.
Organizations find it very hard to be able to select the right tools for their use case & workloads while ensuring that the tool will also meet their future needs. Organization also want to keep using their existing licensed tools wherever possible, but with limited knowledge about interoperability of existing & the container eco-system tools; the decision becomes even more difficult.
I was involved in a project for an online video streaming company which had moved to microservices using spring boot & wanted to deploy these to a cloud hosted kubernetes platform, we were tasked with creating a service discovery engine. In the team itself we had difference of opinions in whether we should choose Consul, etcd, registrator and confd combination and zookeeper or the inbuilt service discovery of docker swarm, kubernetes, ECS etc. Choosing the right tool itself became an uphill task.
Implementation Strategy:
Another roadblock that organizations face in container adoption is the lack of a proper implementation strategy. Container adoption has seen maximum traction at developer & operations engineer level, but for a successful production implementation there needs to be a proper strategy in place. In most organizations the adoption is ad hoc without a cohesive plan, different teams choosing their own strategies. No short term and long term goals are defined, there is a lack of effective checks and balances for the transition phase. Metrics to be tracked are not defined, benchmarking of these is not done.
Applications are not chosen keeping in mind the advantages / disadvantages of moving to containers and accordingly the result of the move are not as anticipated. Anticipation of workloads, need for auto scaling, auto healing, data persistence, service discovery, circuit breaking, canary deployments etc. is not taken into consideration.
The orchestration engines if not chosen / setup properly can also become pain points, as an example both Kubernetes and Docker Swarm work in a master slave configuration & use multi-master setup. Leaders are elected using a consensus algorithm and this algorithm needs to be well understood to ensure that there are no issues.
As an example the RAFT consensus algorithm requires a majority or quorum of [(N/2)+1]
members to agree on values proposed to the cluster.
Container Monitoring:
Container monitoring is a different beast all together and very few organization have been able to get a good handle on how to monitor ephemeral, super-fast and light weight containers. This has to be achieved while ensuring that we do not put undue load on the containers. The complexities of orchestrated container environments are huge and the regular monitoring solutions simply do not cut ice in these conditions.
Consider this use case, we are running a container orchestrator, which has a multiple master setup wherein the master nodes have to be in sync and continuously communicate with each other. On top of this, these master nodes run multiple control activities for maintaining the state of the containers, doing health checks, dynamically creating new containers in case of a failure. Other than this the orchestrator also creates an overlay network, which is a distributed network for Docker daemon hosts. This network sits on top of the host-specific networks, allowing containers connected to it to communicate securely.
All this is just the tip of the iceberg & all these factors have to be taken into account and monitoring has to be enabled for these aspects.
Container Security:
Container security is another aspect that gets neglected while containerizing applications and developing the orchestration strategy & scripts. Security breach of a container is as big a concern as an OS level breach, more so if we are running privileged containers that have root access. Developers and operations engineers do not give enough thought to examine the potential vulnerabilities of the platforms. More often than not interaction between services are not properly tracked and this makes it difficult to identify erroneous interactions, security violations, and potential latency. Most docker images are built on top open source light weight base OS’s and the base images are shared among containers, as docker uses the copy on write concept. This can potentially expose a large surface area if a vulnerability creeps in.
While implementing a container orchestration solution for one large financial client, we figured out that the etcd primary data store was keeping secret data as unencrypted plain text on disk. Such misses can lead to major issues if these slip to the production environment.
While the list of challenges might seem daunting at first, most of these pitfalls can be avoided if we prepare well for the transition and go in with a sound strategy. In the next article I will discuss the best practices to introduce & implement containers in an organization.