问题现象
看到 k8s 集群中有 Evicted 状态的 pod,没有被清理
# kubectl get pod -o wide -A | grep Evicted
simulation-prod cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h 0/1 Evicted 0 42d <none> cn-shanghai.172.22.0.194 <none> <none>
排查过程
可以看到 pod 的状态是 Status:Failed
和 Reason:Evicted
,从 Message 可以知道,Evicted 的原因是 node 磁盘资源不足
# kubectl -n simulation-prod describe pod cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
Name: cloud-simulation-dead-letter-worker-d96bdcf98-dxt7h
Namespace: simulation-prod
Priority: 0
Node: cn-shanghai.172.22.0.194/
Start Time: Mon, 29 Nov 2021 15:48:25 +0800
Labels: app.kubernetes.io/instance=cloud-simulation-dead-letter-worker
app.kubernetes.io/name=cloud-simulation-dead-letter-worker
pod-template-hash=d96bdcf98
Annotations: kubernetes.io/psp: ack.privileged
Status: Failed
Reason: Evicted
Message: The node was low on resource: ephemeral-storage. Container cloud-simulation-dead-letter-worker was using 291599484Ki, which exceeds its request of 0.
IP:
IPs: <none>
Controlled By: ReplicaSet/cloud-simulation-dead-letter-worker-d96bdcf98
Containers:
cloud-simulation-dead-letter-worker:
Image: registry-vpc.cn-shanghai.aliyuncs.com/xxx/cloud_sim:1.1.2111290718.f0cfa04
Port: <none>
Host Port: <none>
Command:
/root/entry/dead_letter_worker.py
Environment:
DEPLOYMENT: prod
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from cloud-simulation-dead-letter-worker-token-4z2xv (ro)
Volumes:
cloud-simulation-dead-letter-worker-token-4z2xv:
Type: Secret (a volume populated by a Secret)
SecretName: cloud-simulation-dead-letter-worker-token-4z2xv
Optional: false
QoS Class: BestEffort
Node-Selectors: node-type=simulation-prod
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events: <none>
问题原因
节点压力驱逐是 kubelet 主动终止 Pod 以回收节点上资源的过程。
kubelet 监控集群节点的 CPU、内存、磁盘空间和文件系统的 inode 等资源。 当这些资源中的一个或者多个达到特定的消耗水平, kubelet 可以主动地使节点上一个或者多个 Pod 失效,以回收资源防止饥饿。
在节点压力驱逐期间,kubelet 将所选 Pod 的 PodPhase 设置为 Failed。这将终止 Pod。
节点压力驱逐不同于 API 发起的驱逐。kubelet 并不理会你配置的 PodDisruptionBudget 或者是 Pod 的 terminationGracePeriodSeconds。
解决办法
kubectl 不会删除 Status:Failed 和 Reason:Evicted 状态的 pod ,因此选择 k8s CronJob 定时删除这些 pod
$ vim 01-sa.yaml
apiVersion: v1
kind: Namespace
metadata:
name: delete-evicted-pods
---
apiVersion: v1
kind: ServiceAccount
metadata:
name: delete-evicted-pods
namespace: delete-evicted-pods
$ vim 02-cr.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRole
metadata:
name: delete-evicted-pods
rules:
- apiGroups: [""]
resources: ["pods"]
verbs: ["get", "watch", "list", "delete"]
$ vim 03-crb.yaml
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: delete-evicted-pods
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: delete-evicted-pods
subjects:
- kind: ServiceAccount
name: delete-evicted-pods
namespace: delete-evicted-pods
$ vim 04-cj.yaml
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: delete-evicted-pods
namespace: delete-evicted-pods
spec:
schedule: "*/30 * * * *"
jobTemplate:
spec:
template:
spec:
serviceAccountName: delete-evicted-pods
containers:
- name: kubectl-runner
image: bitnami/kubectl:1.21.8
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- kubectl get pods --all-namespaces -o go-template='{{range .items}} {{if (eq .status.phase "Failed" )}} {{.metadata.name}}{{" "}} {{.metadata.namespace}}{{" "}} {{.metadata.creationTimestamp}}{{" "}} {{.status.reason}} {{"\n"}}{{end}} {{end}}' | while read epod namespace ct reason; do if [ x"$reason" = x"Evicted" -a $((`date +%s`-`date -d "$ct" +%s`)) -gt 259200 ];then echo "`date "+%Y-%m-%d %H:%M:%S"` delete $namespace $reason $epod "; kubectl -n $namespace delete pod $epod; fi; done;
restartPolicy: OnFailure
参考:
- Pod 的生命周期:https://kubernetes.io/zh/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination
- 节点压力驱逐:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/
- kubelet 驱逐时 Pod 的选择:https://kubernetes.io/zh/docs/concepts/scheduling-eviction/node-pressure-eviction/#kubelet-%E9%A9%B1%E9%80%90%E6%97%B6-pod-%E7%9A%84%E9%80%89%E6%8B%A9
- Kubelet does not delete evicted pods:https://github.com/kubernetes/kubernetes/issues/55051
- 字段选择器的链式选择器:https://kubernetes.io/zh/docs/concepts/overview/working-with-objects/field-selectors/#chained-selectors
- 使用 RBAC 鉴权:https://kubernetes.io/zh/docs/reference/access-authn-authz/rbac/