Resource Management in Kubernetes
One of the proposed advantages of an orchestration system for microservices like Kubernetes is the better resource allocation. How does Kubernetes do this?
- Underlying linux mechanism used by Kubernetes for resource management is called
control groupsa.k.acgroupscgroupsallows limiting, accounting, isolation of resource usage such us CPU, memory, disk I/O, network etc.
- There are two resource types you can manage: cpu & memory:
- You can
requestcertain amount of each resource. This will mainly dictate the behavior of the Kubernetes scheduler for yourpod. i.e. Yourpodwill only be scheduled to anodeiffallocatableresource is higher thanrequest. - You can
limitusage of each resource. This will dictate how operating system deals with your container at runtime. - You can set defaults or acceptable
minormaxvalues usingLimitRangeobject. - Cpu is a compressible resource.
- CPU
requestis guaranteed viacpu. sharescgroupproperty. If your container doesn’t consume as much CPU as it requested at any point, this can be used by other resources, proportional to what they have requested. - CPU
limitis controlled viacpu.cfs_period_usandcpu.cfs_quota_us. This is known as CFS Bandwidth Control. You can read this as: within a givenperiod(i.e. some microseconds), you’re allowed to consume up toquotamicroseconds. If you consume your share in that period, you get throttled and wait until the next period. - Kubernetes divides CPU by 1000, cgroup by 1024. You might see slightly different values if you connect the pod and look at the values.
- CPU
- Memory is a non-compressible resource.
- Memory
requestis only used by Kubernetes scheduler and no control is done viacgroups. - Memory
limitis controlled viamemory.limit_in_bytes. - If a node is running out of memory, kernel will start killing processes. If a process is using more memory than its set limit, it goes towards the top of the list of candidates.
oom_killeris responsible of terminating resources when system experience out of memory.oom_score_adjhelpsoom_killerdecide and this is set bykubeletbased onQoSclasses.
- Memory
- You can
- Kubernetes puts pods into different QoS classes based on their resource allocation demands.
- This is currently calculated based on values set in
requestandlimit. However, technically QoS system and “Requests and Limits” system are orthogonal to each other. i.e. this can be changed in the future. - If the
requestandlimitis the same for apodit will end up in the Guaranteed level. - If
limitis not equal torequestit will end up in the Burstable level. - If none are defined it will end up in the Best-Effort level.
- Main influence of QoS classes seem to be on OOM Score. In short higher levels are less likely to be killed.
- This is currently calculated based on values set in
- Kubernetes scheduler tracks
Allocatableproperty for each node. This is basically whatever is available for yourpodsto be scheduled.Allocatable+eviction-threshold+system-reserved+Kube-reservedis equal to what’s available in a node.- These can be controlled via parameters passed into
kubelet. - For AKS, following are observed on
kubelet:--kube-reserved: This will reserve some resources for Kubernetes services such askubelet,container runtimeetc. For one instance this was observed ascpu=60m,memory=896Mi--enforce-node-allocatable: This controls if and on whatkubeletshould enforce allocatable values (by eviction etc). For AKS this is set topods.--eviction-hard: This will create some safe limits so thatkubeletwill start evictingpodsbefore all resources are drained. In a random AKS node this was set tomemory.available<100Mi,nodefs.available<10%,nodefs.inodesFree<5%.- AKS does not seem to pass in
--system-reserved, the only hope for system to survive seems to be theeviction-threshold.
- With aks-engine you can configure some of the
kubeletoptions related to resource allocation.
References:
Reserve Compute Resources for System Daemons - Kubernetes Configure Out Of Resource Handling - Kubernetes aks-engine/clusterdefinitions.md at master · Azure/aks-engine · GitHub Index of /doc/Documentation/cgroup-v1/ Treat your pods according to their needs - three QoS classes in Kubernetes cloudowski.com Restricting process CPU usage using nice, cpulimit, and cgroups | Scout APM Blog community/resource-qos.md at master · kubernetes/community · GitHub Understanding resource limits in kubernetes: memory Understanding resource limits in kubernetes: cpu time QoS, “Node allocatable” and the Kubernetes Scheduler · Do not go gentle into this good night. Rage.
