If you have been using kubernetes for a long time, then you know what it is resource quotas. But do you know it well enough? Do you know what mechanism is build on? If you did not - you will soon know.
First if all, kubernetes is a container management platform. Therefore, we will delve into the mechanisms of the container.
Cgroups is a Linux kernel mechanism that allows you to place processes in hierarchical groups for which the use of system resources can be limited. For example, the "memory" controller limits the use of RAM, the "cpuacct" controller takes into account the use of processor time.
There are two versions of cgroup: v1 and v2.
cgroupv1:
cgroup v2 has several improvements over cgroup v1, for example:
System requirements for using cgroup v2
Capabilities are the means to manage privileges, which in traditional Unix-like systems were only available to processes.
Permissions for a process to make certain system calls. Only about 20 pieces
Examples:
Finally, about quotas. Mechanisms that allow you to limit the use of resources for a container (not for Pod)
Limit - defines the memory limit for cgroup. If the container tries to use more memory than the Limit, then OOMkiller will kill one of the processes.
Requests - with cgroups v1, they only affect the start of the pod. With cgroups v2 there are special memory.min and memory.low controllers. Exclusively allocated memory for the container, which no one else can use.
Tmpfs (ephemeral storage) counts as memory consumed by the container.
#Container resources example
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: app
image: images.my-company.example/app:v4
resources:
requests:
memory: "64Mi"
cpu: "250m"
ephemeral-storage: "2Gi"
limits:
memory: "128Mi"
cpu: "500m"
ephemeral-storage: "4Gi"
Requests - used by cpu.share. The root cgroup (root) contains the number of CPUs * 1024 shares and inherits child cgroups in proportion to their cpu.shares and so on.
If all shares are occupied, but no one is using anything, then you can leave them.
Limits - used by cfs_period_us and cfs_quota_us. Us is microseconds (mu). Unlike requests, limits are based on time spans.
Scenario 1 (left picture) : 2 thread and 200ms limit. No throttling
Scenario 2 (right): 10 thread and 200ms limit. throttling starts after 20ms and only receive cpu power after 80ms.
Let’s say you have configured 2 core as CPU limit; the k8s will translate this to 200ms. That means the container can use a maximum of 200ms CPU time without getting throttled.
Here starts all misunderstanding. As I said above, the allowed quota is 200 ms, which means if you are running 10 parallel threads on 12 core machine (see the second figure) where all other pods are idle, your quota will exceed the limit in 20ms (10 * 20 ms = 200 ms), and all threads running under that pod will get throttled for next 80 ms. To make the situation worse, the scheduler has a bug that is causing unnecessary throttling and prevents the container from reaching the allowed quota.
The CPU Manager policy is set with the --cpu-manager-policy kubelet flag or the cpuManagerPolicy.
vim /etc/systemd/system/kubelet.service
And add the folowing lines:
--cpu-manager-policy=static \
--kube-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi \
--system-reserved=cpu=1,memory=2Gi,ephemeral-storage=1Gi \
Responsible for placing pods on cluster nodes. 2 stages:
There are 3 strategies for choose:
#Example to use scoringStrategy
apiVersion: kubescheduler.config.k8s.io/v1beta2
kind: KubeSchedulerConfiguration
profiles:
- pluginConfig:
- args:
scoringStrategy:
resources:
- name: cpu
weight: 1
type: MostAllocated
name: NodeResourcesFit
For example, if an operator wants to quota storage with gold storage class separate from bronze storage class, the operator can define a quota as follows:
gold.storageclass.storage.k8s.io/requests.storage: 500Gi
bronze.storageclass.storage.k8s.io/requests.storage: 100Gi
In release 1.8, quota support for local ephemeral storage is added as an alpha feature:
It is possible to configure the total number of objects that can exist in the namespace.
Reasons for use Object Count Quota:
Limits on the number of PIDs. If you create a lot of pids in the container, then the node will run out of PIDs.
Global kubelet setting - different behavior is possible on different nodes if the settings are different.
It is possible to prevent "Fork bomb".
:(){ :|:& };:
But in ext resources you can't use limits. Only requests. Example of extended resources quota:
#correct:
requests.nvidia.com/gpu: "4"
#not correct:
limits.nvidia.com/gpu: "4"
You can set the network bandwidth for the pod in spec.template.metadata.annotations to limit the network traffic of the container.
If the parameters are not specified, the network bandwidth is not limited by default.
The following is an example:
apiVersion: apps/v1
kind: Deployment
metadata:
name: nginx
spec:
template:
metadata:
annotations:
# Ingress bandwidth
kubernetes.io/ingress-bandwidth: 100M
# Egress bandwidth
kubernetes.io/egress-bandwidth: 1G
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
Why it is necessary. Because it: