While debugging in an IDE or using simple command line tools is relatively straightforward, the real challenge lies in production debugging. Modern production environments have enabled sophisticated self-healing deployments, yet they have also made troubleshooting more complex. Kubernetes (aka k8s) is probably the most well-known orchestration production environment. To effectively teach debugging in Kubernetes, it's essential to first introduce its fundamental principles.
This part of the debugging series is designed for developers looking to effectively tackle application issues within Kubernetes environments without delving deeply into the complex DevOps aspects typically associated with its operations. Kubernetes is a big subject. It took me two videos just to explain the basic concepts and background.
As a side note, if you like the content of this and the other posts in this series check out my
Kubernetes, while often discussed in the context of cloud computing and large-scale operations, is not just a tool for managing containers. Its principles apply broadly to all large-scale distributed systems. In this post, I want to explore Kubernetes from the ground up, emphasizing its role in solving real-world problems faced by developers in production environments.
Before Kubernetes, the deployment landscape was markedly different. Understanding this evolution helps us appreciate the challenges Kubernetes aims to solve. The image below represents the road to Kubernetes and the technologies we passed along the way.
In the image, we can see that initially, applications were deployed directly onto physical servers. This process was manual, error-prone, and difficult to replicate across multiple environments. For instance, if a company needed to scale its application, it involved procuring new hardware, installing operating systems, and configuring the application from scratch. This could take weeks or even months, leading to significant downtime and operational inefficiencies.
Imagine a retail company preparing for the holiday season surge. Each time they needed to handle increased traffic, they would manually set up additional servers. This was not only time-consuming but also prone to human error. Scaling down after the peak period was equally cumbersome, leading to wasted resources.
Virtualization technology introduced a layer that emulated the hardware, allowing for easier replication and migration of environments but at the cost of performance. However, fast virtualization enabled the cloud revolution. It lets companies like Amazon lease its servers at scale without compromising their own workloads.
Virtualization involves running multiple operating systems on a single physical hardware host. Each virtual machine (VM) includes a full copy of an operating system, the application, necessary binaries, and libraries—taking up tens of GBs. VMs are managed via a hypervisor, such as VMware's ESXi or Microsoft's Hyper-V, which sits between the hardware and the operating system and is responsible for distributing hardware resources among the VMs. This layer adds additional overhead and can lead to decreased performance due to the need to emulate hardware.
Note that virtualization is often referred to as "virtual machines." I chose to avoid that terminology due to the focus of this blog on Java and the JVM, where a virtual machine is typically a reference to the Java Virtual Machine (JVM).
Containers emerged as a lightweight alternative to full virtualization. Tools like Docker standardized container formats, making it easier to create and manage containers without the overhead associated with traditional virtual machines. Containers encapsulate an application’s runtime environment, making them portable and efficient.
Unlike virtualization, containerization encapsulates an application in a container with its own operating environment, but it shares the host system’s kernel with other containers. Containers are thus much more lightweight, as they do not require a full OS instance; instead, they include only the application and its dependencies, such as libraries and binaries. This setup reduces the size of each container and improves boot times and performance by removing the hypervisor layer.
Containers operate using several key Linux kernel features:
As containers began to replace virtualization due to their efficiency and speed, developers and organizations rapidly adopted them for a wide range of applications. However, this surge in container usage brought with it a new set of challenges, primarily related to managing large numbers of containers at scale.
While containers are incredibly efficient and portable, they introduce complexities when used extensively, particularly in large-scale, dynamic environments:
Management Overhead: Manually managing hundreds or even thousands of containers quickly becomes unfeasible. This includes deployment, networking, scaling, and ensuring availability and security.
Resource Allocation: Containers must be efficiently scheduled and managed to optimally use physical resources, avoiding underutilization or overloading of host machines.
Service Discovery and Load Balancing: As the number of containers grows, keeping track of which container offers which service and how to balance the load between them becomes critical.
Updates and Rollbacks: Implementing rolling updates, managing version control, and handling rollbacks in a containerized environment require robust automation tools.
To address these challenges, the concept of container orchestration was developed. Orchestration automates the scheduling, deployment, scaling, networking, and lifecycle management of containers, which are often organized into microservices. Efficient orchestration tools help ensure that the entire container ecosystem is healthy and that applications are running as expected.
Among the orchestration tools, Kubernetes emerged as a frontrunner due to its robust capabilities, flexibility, and strong community support. Kubernetes offers several features that address the core challenges of managing containers:
Kubernetes not only solves practical, operational problems associated with running containers but also integrates with the broader technology ecosystem, supporting continuous integration and continuous deployment (CI/CD) practices. It is backed by the Cloud Native Computing Foundation (CNCF), ensuring it remains cutting-edge and community-focused.
There used to be a site called "
Understanding Kubernetes architecture is crucial for debugging and troubleshooting. The following image shows the high-level view of a Kubernetes deployment. There are far more details in most tutorials geared towards DevOps engineers, but for a developer, the point that matters is just "Your Code," which is that tiny corner at the edge.
In the image above, we can see:
Master Node (represented by the blue Kubernetes logo on the left): The control plane of Kubernetes, responsible for managing the state of the cluster, scheduling applications, and handling replication.
Worker Nodes: These nodes contain the pods that run the containerized applications. Each worker node is managed by the master.
Pods: The smallest deployable units created and managed by Kubernetes, usually containing one or more containers that need to work together.
These components work together to ensure that an application runs smoothly and efficiently across the cluster.
Up until now, this post has been theory-heavy; let's review some commands we can use to work with a Kubernetes cluster. First, we would want to list the pods we have within the cluster, which we can do using the get pods
command as such:
$ kubectl get pods
NAME READY STATUS RESTARTS AGE
my-first-pod-id-xxxx 1/1 Running 0 13s
my-second-pod-id-xxxx 1/1 Running 0 13s
A command such as kubectl describe pod
returns a high-level description of the pod, such as its name, parent node, etc. Many problems in production pods can be solved by looking at the system log. This can be accomplished by invoking the logs
command:
$ kubectl logs -f <pod>
[2022-11-29 04:12:17,262] INFO log data
...
Most typical large-scale application logs are ingested by tools such as Elastic, Loki, etc. As such, the logs command isn't as useful in production except for debugging edge cases.
This introduction to Kubernetes has set the stage for deeper exploration into specific debugging and troubleshooting techniques, which we will cover in the upcoming posts. The complexity of Kubernetes makes it much harder to debug, but there are facilities in place to work around some of that complexity.
While this article (and its follow-ups) focus on Kubernetes, future posts will delve into observability and related tools, which are crucial for effective debugging in production environments.