Kubernetes Day-2 Operations

Kubernetes has revolutionized the way developers run their workloads by abstracting a part of the actual infrastructure. Major organizations are already past the planning, configuration, and installation phases of the Kubernetes adoption. Kubernetes Day-2 operations present some pressing challenges as well as unique opportunities to deliver value to your customers.

Part I – Manifest Management, Application Lifecycle Update, and Volume Management, and Part II – Dynamic Parameters, Kubernetes Cluster Bootstrapping & Kubernetes RBAC of the series have already discussed in great length some inherited and security pain points of using Kubernetes in production. High performance and unparalleled scalability are two major reasons enterprises are switching ship to Kubernetes.

Nonetheless, ease of management and profound security doesn’t always translate to uncompromising performance at any scale. Part III of this series accounts for some major performance bottlenecks Kubernetes clusters experience when operating at scale.

A bad network or a sudden onslaught of traffic is bad for performance, nevertheless. At times, associating nods to pods can become a major headache for an inexperienced developer. Who told you Kubernetes auto-scales out of the box?

Perhaps, we should discuss these pain points in more detail.

Network & Traffic Management

The operability and performance of a cloud application are subject to various network and traffic parameters. These parameters are sometimes unpredictable and go out of sync more often than not. These parameters shouldn’t be a problem when you’re on Netflix and the new season of The Walking Dead is just out.

You will probably get away with a few bad ratings on the App Store. However, when you’re running business-critical applications, there is more at stake. A transient fault can result in an undesirable action if the application encounters it in the midst of a critical transaction. A financial transaction running into a transient fault may result in a loss of trust between the parties.

To avoid such undesirables, developers implement transparent network and traffic policies in their cloud applications in the form of route rules, traffic policies, retry policies, etc. When working with Kubernetes, developers are supposed to configure traffic management policies for each Node, Service, or Pod manually.

There could be hundreds of those. In addition, they need to add service information in service mesh resources (virtual service, destination rule..).

Network and traffic management, while critical to a cloud application, Kubernetes makes the task a long and tedious job for developers.

Kubernetes Autoscaling

Containerization allows developers to scale up and down an individual service in their application on demand. Autoscaling is a major reason behind the rising popularity of Kubernetes. The growing adoption of containers is only mounting pressure on the developer community to get familiar with Kubernetes autoscaling asap.

Kubernetes doesn’t support auto-scaling out of the box, and developers have to activate a Kubernetes add-on called Metrics Server (and probably other tools). Configuring Metrics Server shouldn’t be a task, except each public cloud provider has a unique set of configuration settings.

That means the developer must configure the Kubernetes autoscaler for each public cloud. If learning the Kubernetes autoscaler wasn’t enough, now you must also acquaint yourself with each cloud provider’s scaling group abstractions.

Hint: Multi-cloud application deployment can be a bad idea if you’re autoscaling Kubernetes and don’t have the right technical skills to go through all the complexity of each cloud provider along with how autoscaling works with each.

Did I mention that after starting a node, you must manually bootstrap it to join the Kubernetes cluster? Being a developer was never this challenging.

Associating Pods to Nodes

The great strength of Kubernetes lies in its scheduler. Kubernetes scheduler associates pods with nodes in a way that the cluster never runs out of resources due to an inefficient association.

Kubernetes scheduler is a great tool, but not when you need control over automation. If you’re running an AI workload, you probably want those pods to be associated with specific nodes that are tied to hardware with parallel computing capabilities.

Compute-intensive containers must go with compute-optimized nodes. Expectedly, Kubernetes does nothing to make that association happen. A developer must label nodes, put the applicable selectors to the deployments, and repeat the exercise for other nodes.

They must configure convoluted affinity rules to associate a group of deployments to a group of nodes. This exercise tends to grow in complexity as the diverse pool of nodes expands.

To give you a hint, this is the configuration required to associate a pod to a node with GPU capabilities. 1: Adding labels to node objects 2: Use those labels in the node selectors 3: Label all the nodes which have GPU.

As said, while Kubernetes is a boon for organizations, it is not very friendly to developers. They not only have to learn the new technology with a steep learning curve, but also have to undertake a lot more manual tasks than ever before. Coding is already a laborious task; they shouldn’t be spending the whole day associating pods to specific nodes or writing policies.

Integration with VM (Legacy) Services

When you migrate your workloads to the cloud and decide to use Kubernetes, you will certainly use one of the migration strategies known as the 6 R’s. Whether you are using Rehosting, Replatforming, Refactoring, or other strategies, you may, at a certain time, decide to keep some workloads on your VMs and others on containers.

When deciding on Kubernetes as a platform and an architectural shift to Docker containers, there is a critical need to manage and secure the communication between services, including integration with VM (legacy) services.

The integration of your (legacy) VMs application with Docker containers may be very complex – it depends on your use case, but usually, this is not an easy, straightforward process. In the use case where your Docker containers need to call one or some VMs services running behind a firewall, each service will require manual configurations, usually complex.

CloudPlex addresses challenges of Kubernetes day-2 operations

CloudPlex is on a mission to make Kubernetes a lot friendlier for developers by bringing automation to the bunch. At the center of CloudPlex, is the drag and drop tool. The visual interface allows developers to automate typical manual actions that Kubernetes subjects them to, wherever possible.

When automation is impossible or when it is over-engineering, then our drag and drop tool allows developers to move around the obstacle while avoiding an endless number of configuration files.

When it comes to network and traffic management, developers need not have to worry about putting in configurations to retry policies, circuit breakers, and fault injection. CloudPlex automates the process of creating and configuring the required resources (virtual service, destination rule) for each container. In addition, CloudPlex puts service information.

CloudPlex automates the configuration of the Metrics Server on each public cloud vendor without subjecting the developers to unnecessary vendor-specific details. When it comes to pod autoscaling, developers just have to enter minimum and maximum values for resource quotas and replicas.

When node autoscaling is in question, developers can select one of the node templates provided by CloudPlex. CloudPlex also offers a VM service in which a developer needs to only provide the service information – nothing more.

All the configurations required to integrate the VM services and containers are created automatically by the platform. As shown here, this visual process is simple, easy, and fast.

CloudPlex allows developers to visualize their nodes and deployments in a window. They can attach deployments to nodes via the visual interface while the platform automates the handling of node labels and selectors. CloudPlex, by fixing the shortcomings of Kubernetes, makes the life of developers a lot easier.

You can check how CloudPlex makes these pain points easier to deal with here.

Asad Faizi

Founder CEO

CloudPlex.io, Inc

[email protected]

Also published here

Kubernetes Day-2 Operations – Part III

Network & Traffic Management

Kubernetes Autoscaling

Associating Pods to Nodes

Integration with VM (Legacy) Services

CloudPlex addresses challenges of Kubernetes day-2 operations