In this article we are going to consider the two most common methods for Autoscaling in EKS cluster:
Horizontal Pod Autoscaler (HPA)Cluster Autoscaler (CA)
The Horizontal Pod Autoscaler or HPA is a Kubernetes component that automatically scales your service based on metrics such as CPU utilization or others, as defined through the Kubernetes metric server. The HPA scales the pods in either a deployment or replica set, and is implemented as a Kubernetes API resource and a controller. The Controller Manager queries the resource utilization against the metrics specified in each horizontal pod autoscaler definition. It obtains the metrics from either the resource metrics API for per pod metrics or the custom metrics API for any other metrics.
To see this in action, we are going to configure HPA and then apply some load to our system to see it in action.
To start with, let us start with installing Helm as a package manager for Kubernetes.
curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > helm.sh
chmod +x helm.sh
./helm.sh
Now, we are going to set up the server base portion of Helm called Tiller. This requires a service account:
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: tiller
namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
name: tiller
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
name: tiller
namespace: kube-system
The above defines a Tiller service account to which we have assigned the cluster admin role. Now let's go ahead and apply the configuration:
kubectl apply -f tiller.yml
Run
helm init
using the Tiller service account we have just created:helm init --service-account tiller
With this we have installed Tiller onto the cluster, which gives access to manage those resources within it.
With Helm installed, we can now deploy the metric server. Metric servers are cluster wide aggregators of resource usage data where metrics are collected by
kubelet
on each worker node, and are used to dictate the scaling behavior of deployments.So let's go ahead and install that now:
helm install stable/metrics-server --name metrics-server --version 2.0.4 --namespace metrics
Once all checks have passed, we are ready to scale the application.
For the purpose of this article, we will deploy a special build of Apache and PHP designed to generate CPU utilization:
kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80
**requests=cpu=200m - requesting 200 millicores get allocated to pod
Now, let us autoscale our deployment:
kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10
The above specifies that the HPA will increase or decrease the number of replicas to maintain an average CPU utilization across all pods by 50%. Since each pod requests 200 millicores (as specified in the previous command), the average CPU utilization of 100 millicores is maintained.
Let's check the status:
kubectl get hpa
Review
Targets
column, if it says unknown/50%
then it means that the current CPU consumption is 0%, as we are not currently sending any request to the server. This will take a couple of minutes to show the correct value, so let us grab a cup of coffee and come back when we have got some data here.Rerun the last command and confirm that
Targets
column is now 0%/50%
. Now, let's generate some load in order to trigger scaling by running the following :kubectl run -i --tty load-generator --image=busybox /bin/sh
Inside this container, we are going to send an infinite number of requests to our service. If we flip back over to the other terminal, we can watch the autoscaler in action:
kubectl get hpa -w
We can watch the HPA scaler pod up from 1 to our configured maximum of 10, until the average CPU utilization is below our target of 50%. It will take about 10 minutes to run and you could see we are now having 10 replicas. If we flip back to the other terminal to terminate the load test, and flip back to the scaler terminal, we can see the HPA reduce the replica count back to the minimum.
The Cluster Autoscaler is the default Kubernetes component that can scale either pods or nodes in a cluster. It automatically increases the size of an autoscaling group, so that pods can continue to get placed successfully. It also tries to remove unused worker nodes from the autoscaling group (the ones with no pods running).
The following AWS CLI command will create an Auto scaling group with minimum of one and maximum count of ten:
eksctl create nodegroup --cluster <CLUSTER_NAME> --node-zones <REGION_CODE> --name <REGION_CODE> --asg-access --nodes-min 1 --nodes 5 --nodes-max 10 --managed
Now, we need to apply an inline IAM policy to our worker nodes:
{
"Version": "2012-10-17",
"Statement": [
{
"Action": [
"autoscaling:DescribeAutoScalingGroups",
"autoscaling:DescribeAutoScalingInstances",
"autoscaling:DescribeLaunchConfigurations",
"autoscaling:DescribeTags",
"autoscaling:SetDesiredCapacity",
"autoscaling:TerminateInstanceInAutoScalingGroup",
"ec2:DescribeLaunchTemplateVersions"
],
"Resource": "*",
"Effect": "Allow"
}
]
}
This basically allows the EC2 worker nodes posting the cluster auto scaler the ability to manipulate auto scaling. Copy it and add to your EC2 IAM role.
Next, download the following file:
wget https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml
And update the following line with your cluster name:
- --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME>
Finally, we can deploy our Autoscaler:
kubectl apply -f cluster-autoscaler-autodiscover.yaml
Of course we should wait for the pods to finish creating. Once done, we can scale our cluster out. We will consider a simple
nginx
application with the following yaml
file:apiVersion: extensions/v1beta2
kind: Deployment
metadata:
name: nginx-scale
spec:
selector:
matchLabels:
app: nginx
replicas: 1
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: nginx:1.14.2
ports:
- containerPort: 80
resources:
limits:
cpu: 500m
memory: 512Mi
requests:
cpu: 500m
memory: 512Mi
Let's go ahead and deploy the application:
kubectl apply -f nginx.yaml
And check the deployment:
kubectl get deployment/nginx-scale
Now, let's scale a replica up to 10:
kubectl scale --replicas=10 deployment/nginx-scale
We can see our some pods in the pending state, which is the trigger that the cluster auto scaler uses to scale out our fleet of EC2 instances.
kubectl get pods -o wide --watch
In this article, we considered both types of EKS cluster autoscaling. We learnt how the Cluster Autoscaler initiates scale-in and scale-out operations each time it detects under-utilized instances or pending pods. Horizontal Pod Autoscaler and Cluster Autoscaler are essential features of Kubernetes when it comes to scaling a microservice application. Hope you found this article useful but there is more to come. Till then, happy scaling!
About the author - Sudip is a Solution Architect with more than 15 years of working experience, and is the founder of Javelynn. He likes sharing his knowledge through writing, and while he is not doing that, he must be fishing or playing chess.
Previously posted at https://appfleet.com/.