In this article we are going to consider the two most common methods for Autoscaling in EKS cluster: Horizontal Pod Autoscaler (HPA)Cluster Autoscaler (CA) The is a Kubernetes component that automatically scales your service based on metrics such as CPU utilization or others, as defined through the Kubernetes metric server. The HPA scales the pods in either a deployment or replica set, and is implemented as a Kubernetes API resource and a controller. The Controller Manager queries the resource utilization against the metrics specified in each horizontal pod autoscaler definition. It obtains the metrics from either the resource metrics API for per pod metrics or the custom metrics API for any other metrics. Horizontal Pod Autoscaler or HPA To see this in action, we are going to configure HPA and then apply some load to our system to see it in action. To start with, let us start with installing Helm as a package manager for Kubernetes. curl https://raw.githubusercontent.com/kubernetes/helm/master/scripts/get > helm.sh
 chmod +x helm.sh
 ./helm.sh Now, we are going to set up the server base portion of Helm called . This requires a service account: Tiller ---
apiVersion: v1
kind: ServiceAccount
metadata:
  name: tiller
  namespace: kube-system
---
apiVersion: rbac.authorization.k8s.io/v1beta1
kind: ClusterRoleBinding
metadata:
  name: tiller
roleRef:
  apiGroup: rbac.authorization.k8s.io
  kind: ClusterRole
  name: cluster-admin
subjects:
  - kind: ServiceAccount
    name: tiller
    namespace: kube-system The above defines a Tiller service account to which we have assigned the cluster admin role. Now let's go ahead and apply the configuration: kubectl apply -f tiller.yml Run using the Tiller service account we have just created: helm init helm init --service-account tiller With this we have installed Tiller onto the cluster, which gives access to manage those resources within it. With Helm installed, we can now deploy the metric server. Metric servers are cluster wide aggregators of resource usage data where metrics are collected by on each worker node, and are used to dictate the scaling behavior of deployments. kubelet So let's go ahead and install that now: helm install stable/metrics-server --name metrics-server --version 2.0.4 --namespace metrics Once all checks have passed, we are ready to scale the application. For the purpose of this article, we will deploy a special build of Apache and PHP designed to generate CPU utilization: kubectl run php-apache --image=k8s.gcr.io/hpa-example --requests=cpu=200m --expose --port=80 **requests=cpu=200m - requesting 200 millicores get allocated to pod Now, let us autoscale our deployment: kubectl autoscale deployment php-apache --cpu-percent=50 --min=1 --max=10 The above specifies that the HPA will increase or decrease the number of replicas to maintain an average CPU utilization across all pods by 50%. Since each pod requests 200 millicores (as specified in the previous command), the average CPU utilization of 100 millicores is maintained. Let's check the status: kubectl get hpa Review column, if it says then it means that the current CPU consumption is 0%, as we are not currently sending any request to the server. This will take a couple of minutes to show the correct value, so let us grab a cup of coffee and come back when we have got some data here. Targets unknown/50% Rerun the last command and confirm that column is now . Now, let's generate some load in order to trigger scaling by running the following : Targets 0%/50% kubectl run -i --tty load-generator --image=busybox /bin/sh Inside this container, we are going to send an infinite number of requests to our service. If we flip back over to the other terminal, we can watch the autoscaler in action: kubectl get hpa -w We can watch the HPA scaler pod up from 1 to our configured maximum of 10, until the average CPU utilization is below our target of 50%. It will take about 10 minutes to run and you could see we are now having 10 replicas. If we flip back to the other terminal to terminate the load test, and flip back to the scaler terminal, we can see the HPA reduce the replica count back to the minimum. Cluster Autoscaler The Cluster Autoscaler is the default Kubernetes component that can scale either pods or nodes in a cluster. It automatically increases the size of an autoscaling group, so that pods can continue to get placed successfully. It also tries to remove unused worker nodes from the autoscaling group (the ones with no pods running). The following AWS CLI command will create an Auto scaling group with minimum of one and maximum count of ten: eksctl create nodegroup --cluster <CLUSTER_NAME> --node-zones <REGION_CODE> --name <REGION_CODE> --asg-access --nodes-min 1 --nodes 5 --nodes-max 10 --managed Now, we need to apply an inline IAM policy to our worker nodes: { : , : [
        { : [ , , , , , , ], : , : }
    ]
} "Version" "2012-10-17" "Statement" "Action" "autoscaling:DescribeAutoScalingGroups" "autoscaling:DescribeAutoScalingInstances" "autoscaling:DescribeLaunchConfigurations" "autoscaling:DescribeTags" "autoscaling:SetDesiredCapacity" "autoscaling:TerminateInstanceInAutoScalingGroup" "ec2:DescribeLaunchTemplateVersions" "Resource" "*" "Effect" "Allow" This basically allows the EC2 worker nodes posting the cluster auto scaler the ability to manipulate auto scaling. Copy it and add to your EC2 IAM role. Next, download the following file: wget https://raw.githubusercontent.com/kubernetes/autoscaler/master/cluster-autoscaler/cloudprovider/aws/examples/cluster-autoscaler-autodiscover.yaml And update the following line with your cluster name: - --node-group-auto-discovery=asg:tag=k8s.io/cluster-autoscaler/enabled,k8s.io/cluster-autoscaler/<YOUR CLUSTER NAME> Finally, we can deploy our Autoscaler: kubectl apply -f cluster-autoscaler-autodiscover.yaml Of course we should wait for the pods to finish creating. Once done, we can scale our cluster out. We will consider a simple application with the following file: nginx yaml apiVersion: extensions/v1beta2
kind: Deployment
metadata:
  name: nginx-scale
spec:
  selector:
    matchLabels:
      app: nginx
  replicas: 1
  template:
    metadata:
      labels:
        app: nginx
    spec:
      containers:
      - name: nginx
        image: nginx:1.14.2
        ports:
        - containerPort: 80
        resources: 
          limits:
            cpu: 500m
            memory: 512Mi
          requests:
            cpu: 500m
            memory: 512Mi Let's go ahead and deploy the application: kubectl apply -f nginx.yaml And check the deployment: kubectl get deployment/nginx-scale Now, let's scale a replica up to 10: kubectl scale --replicas=10 deployment/nginx-scale We can see our some pods in the pending state, which is the trigger that the cluster auto scaler uses to scale out our fleet of EC2 instances. kubectl get pods -o wide --watch Conclusion In this article, we considered both types of EKS cluster autoscaling. We learnt how the Cluster Autoscaler initiates scale-in and scale-out operations each time it detects under-utilized instances or pending pods. Horizontal Pod Autoscaler and Cluster Autoscaler are essential features of Kubernetes when it comes to scaling a microservice application. Hope you found this article useful but there is more to come. Till then, happy scaling! About the author - Sudip is a Solution Architect with more than 15 years of working experience, and is the founder of . He likes sharing his knowledge through writing, and while he is not doing that, he must be fishing or playing chess. Javelynn Previously posted at https://appfleet.com/ .

Amazon

Apache

Discovery

Target

Zones

14 Steps to Debugging a Node.js Application Running in a Docker Container

How To Run Cloud-Native Performance Benchmarks with Kubestone

Read My Stories

The Curious Techie

Too Long; Didn't Read

How to Autoscale an Amazon Elastic Kubernetes Service Cluster

How to Autoscale an Amazon Elastic Kubernetes Service Cluster

About Author

Comments

TOPICS

THIS ARTICLE WAS FEATURED IN

Related Stories

14 Steps to Debugging a Node.js Application Running in a Docker Container

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

14 Steps to Debugging a Node.js Application Running in a Docker Container

The Noonification: How Amazon Treats Warehouse Workers Who Contracted COVID (11/30/2022)

10 Free Ways to Promote Your Amazon Products

10 Failed Startup Product Examples by Google, Microsoft and Amazon

10 Best Infographics Of 2018

The Noonification: The Destroyer (12/29/2022)

Light-Mode

Classic

Newspaper

Minty

Dark-Mode

Neon Noir

Minty

HN StartUps