Resource Management in Kubernetes
One of the proposed advantages of an orchestration system for microservices like Kubernetes is the better resource allocation. How does Kubernetes do this?
- Underlying linux mechanism used by Kubernetes for resource management is called
control groups
a.k.acgroups
cgroups
allows limiting, accounting, isolation of resource usage such us CPU, memory, disk I/O, network etc.
- There are two resource types you can manage: cpu & memory:
- You can
request
certain amount of each resource. This will mainly dictate the behavior of the Kubernetes scheduler for yourpod
. i.e. Yourpod
will only be scheduled to anode
iffallocatable
resource is higher thanrequest
. - You can
limit
usage of each resource. This will dictate how operating system deals with your container at runtime. - You can set defaults or acceptable
min
ormax
values usingLimitRange
object. - Cpu is a compressible resource.
- CPU
request
is guaranteed viacpu. shares
cgroup
property. If your container doesn’t consume as much CPU as it requested at any point, this can be used by other resources, proportional to what they have requested. - CPU
limit
is controlled viacpu.cfs_period_us
andcpu.cfs_quota_us
. This is known as CFS Bandwidth Control. You can read this as: within a givenperiod
(i.e. some microseconds), you’re allowed to consume up toquota
microseconds. If you consume your share in that period, you get throttled and wait until the next period. - Kubernetes divides CPU by 1000, cgroup by 1024. You might see slightly different values if you connect the pod and look at the values.
- CPU
- Memory is a non-compressible resource.
- Memory
request
is only used by Kubernetes scheduler and no control is done viacgroups
. - Memory
limit
is controlled viamemory.limit_in_bytes
. - If a node is running out of memory, kernel will start killing processes. If a process is using more memory than its set limit, it goes towards the top of the list of candidates.
oom_killer
is responsible of terminating resources when system experience out of memory.oom_score_adj
helpsoom_killer
decide and this is set bykubelet
based onQoS
classes.
- Memory
- You can
- Kubernetes puts pods into different QoS classes based on their resource allocation demands.
- This is currently calculated based on values set in
request
andlimit
. However, technically QoS system and “Requests and Limits” system are orthogonal to each other. i.e. this can be changed in the future. - If the
request
andlimit
is the same for apod
it will end up in the Guaranteed level. - If
limit
is not equal torequest
it will end up in the Burstable level. - If none are defined it will end up in the Best-Effort level.
- Main influence of QoS classes seem to be on OOM Score. In short higher levels are less likely to be killed.
- This is currently calculated based on values set in
- Kubernetes scheduler tracks
Allocatable
property for each node. This is basically whatever is available for yourpods
to be scheduled.Allocatable
+eviction-threshold
+system-reserved
+Kube-reserved
is equal to what’s available in a node.- These can be controlled via parameters passed into
kubelet
. - For AKS, following are observed on
kubelet
:--kube-reserved
: This will reserve some resources for Kubernetes services such askubelet
,container runtime
etc. For one instance this was observed ascpu=60m,memory=896Mi
--enforce-node-allocatable
: This controls if and on whatkubelet
should enforce allocatable values (by eviction etc). For AKS this is set topods
.--eviction-hard
: This will create some safe limits so thatkubelet
will start evictingpods
before all resources are drained. In a random AKS node this was set tomemory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%
.- AKS does not seem to pass in
--system-reserved
, the only hope for system to survive seems to be theeviction-threshold
.
- With aks-engine you can configure some of the
kubelet
options related to resource allocation.
References:
Reserve Compute Resources for System Daemons - Kubernetes Configure Out Of Resource Handling - Kubernetes aks-engine/clusterdefinitions.md at master · Azure/aks-engine · GitHub Index of /doc/Documentation/cgroup-v1/ Treat your pods according to their needs - three QoS classes in Kubernetes cloudowski.com Restricting process CPU usage using nice, cpulimit, and cgroups | Scout APM Blog community/resource-qos.md at master · kubernetes/community · GitHub Understanding resource limits in kubernetes: memory Understanding resource limits in kubernetes: cpu time QoS, “Node allocatable” and the Kubernetes Scheduler · Do not go gentle into this good night. Rage.