锻骨境-第6层 k8s集群数据备份与恢复

K8s 集群的数据备份和恢复确实主要就是etcd 数据库集群的备份和恢复

etcd集群用同一份snapshot恢复

k8s 的数据在etcd 里存储的是怎样的

etcd 是k8s集群极为重要的一块服务,存储了集群所有的数据信息。同理,如果发生灾难或者etcd 的数据丢失,都会影响集群数据的恢复。所以,本层我们重点是如何备份和恢复数据。


开胃菜-etcd简单知识

接上一层,由于etcd 服务做了ca 证书,所以,etcdctl 客户端就必须设置证书参数了,如下:

./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem --endpoints=https://192.168.10.133:2379,https://192.168.10.134:2379      你的命令

简单来几条查看下存储的k8s数据:

查看下集群的状态

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem   --endpoints=https://192.168.10.133:2379,https://192.168.10.134:2379  endpoint health
https://192.168.10.133:2379 is healthy: successfully committed proposal: took = 14.251562ms
https://192.168.10.134:2379 is healthy: successfully committed proposal: took = 14.420846ms

获取某个key 的信息

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem --endpoints=https://192.168.10.133:2379,https://192.168.10.134:2379     get /registry/apiregistration.k8s.io/apiservices/v1.apps 
/registry/apiregistration.k8s.io/apiservices/v1.apps
{"kind":"APIService","apiVersion":"apiregistration.k8s.io/v1beta1","metadata":{"name":"v1.apps","uid":"93eb024b-fad5-11e9-a51a-000c29a4e4b2","creationTimestamp":"2019-10-30T05:24:42Z","labels":{"kube-aggregator.kubernetes.io/automanaged":"onstart"}},"spec":{"service":null,"group":"apps","version":"v1","groupPriorityMinimum":17800,"versionPriority":15},"status":{"conditions":[{"type":"Available","status":"True","lastTransitionTime":"2019-10-30T05:24:42Z","reason":"Local","message":"Local APIServices are always available"}]}}

获取每个节点的状态

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem   --endpoints=https://192.168.10.133:2379,https://192.168.10.134:2379  endpoint  status
https://192.168.10.133:2379, 5e881233406036eb, 3.4.3, 3.9 MB, false, false, 1913, 34872, 34872, 
https://192.168.10.134:2379, f3796165363d755a, 3.4.3, 3.9 MB, true, false, 1913, 34872, 34872

获取etcd 版本信息

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem   --endpoints=https://192.168.10.133:2379,https://192.168.10.134:2379  version 
etcdctl version: 3.4.3
API version: 3.4

获取 所有的key

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem https://192.168.10.133:2379,https://192.168.10.134:2379     get / --prefix --keys-only
/registry/apiregistration.k8s.io/apiservices/v1.
/registry/apiregistration.k8s.io/apiservices/v1.apps
/registry/secrets/kube-system/node-controller-token-8pbtv
/registry/secrets/kube-system/persistent-volume-binder-token-tjhmn
/registry/secrets/kube-system/pod-garbage-collector-token-9rbvg
/registry/secrets/kube-system/pv-protection-controller-token-zzqkq
/registry/secrets/kube-system/pvc-protection-controller-token-b2vjh
/registry/secrets/kube-system/replicaset-controller-token-xzrrg
/registry/secrets/kube-system/replication-controller-token-7hzqr
/registry/secrets/kube-system/resourcequota-controller-token-jx6zn
/registry/serviceaccounts/kube-system/certificate-controller
/registry/serviceaccounts/kube-system/clusterrole-aggregation-controller
/registry/serviceaccounts/kube-system/coredns
/registry/serviceaccounts/kube-system/cronjob-controller
/registry/serviceaccounts/kube-system/daemon-set-controller
/registry/serviceaccounts/kube-system/default
/registry/serviceaccounts/kube-system/deployment-controller
/registry/serviceaccounts/kube-system/disruption-controller
/registry/serviceaccounts/kube-system/endpoint-controller
/registry/serviceaccounts/kube-system/expand-controller
/registry/serviceaccounts/kube-system/flannel
/registry/serviceaccounts/kube-system/generic-garbage-collector
/registry/serviceaccounts/kube-system/horizontal-pod-autoscaler
/registry/serviceaccounts/kube-system/job-controller
/registry/serviceaccounts/kube-system/kube-proxy
/registry/services/specs/default/kubernetes
/registry/services/specs/kube-system/kube-dns

基本可以看到,etcd 里存储了所有的k8s 相关的数据,容器pod信息,服务信息,token信息,以及资源信息等。既然所有的东西都是在里面存储,那么在k8s 集群发生故障的时候,只要我们从etcd 备份里恢复数据就可以达到故障前的k8s 集群效果。

备份

注意不同的版本的etcdctl 命令不一样,但大体差不多,这里用的是napshot save , 每次备份一个节点就行。

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem     snapshot save  /var/lib/etcd_backup/backup_$(date "+%Y%m%d%H%M%S").db  
{"level":"info","ts":1572440676.1362343,"caller":"snapshot/v3_snapshot.go:110","msg":"created temporary db file","path":"/var/lib/etcd_backup/backup_20191030210436.db.part"}
{"level":"warn","ts":"2019-10-30T21:04:36.144+0800","caller":"clientv3/retry_interceptor.go:116","msg":"retry stream intercept"}
{"level":"info","ts":1572440676.1444993,"caller":"snapshot/v3_snapshot.go:121","msg":"fetching snapshot","endpoint":"127.0.0.1:2379"}
{"level":"info","ts":1572440676.3010879,"caller":"snapshot/v3_snapshot.go:134","msg":"fetched snapshot","endpoint":"127.0.0.1:2379","took":0.164486462}
{"level":"info","ts":1572440676.3011765,"caller":"snapshot/v3_snapshot.go:143","msg":"saved","path":"/var/lib/etcd_backup/backup_20191030210436.db"}
Snapshot saved at /var/lib/etcd_backup/backup_20191030210436.db

[root@k8s-master etcd]# cd /var/lib/etcd_backup/
[root@k8s-master etcd_backup]# ls
backup_20191030210436.db

恢复

  • 暂停kube-apiserver 服务,确保apiserver 服务已经停止运行
# 先把apiserver 文件存储下
mkdir -p /etc/kubernetes/manifests-backups
mv /etc/kubernetes/manifests/kube-apiserver.yaml /etc/kubernetes/manifests-backups/
# 检查api-server 是否停止
ps -ef|grep kube-api|grep -v grep |wc -l
0
  • 分别在Master节点上,停止ETCD服务
service etcd stop

移除目录下的数据

mv /var/lib/etcd/data.etcd /var/lib/etcd/data.etcd_bak
  • 恢复,etcd集群用同一份snapshot恢复
    分别在各个节点恢复数据,首先需要拷贝数据到每个master节点, 假设备份数据存在于/var/lib/etcd_backup/backup_20180107172459.db
scp /var/lib/etcd_backup/backup_20180107172459.db root@master1:/var/lib/etcd_backup/
scp /var/lib/etcd_backup/backup_20180107172459.db root@master2:/var/lib/etcd_backup/
scp /var/lib/etcd_backup/backup_20180107172459.db root@master3:/var/lib/etcd_backup/

etcd 的机器上执行
如在192.168.10.133 上。 其实这里可以做成脚本
指定恢复的目录为nodeName.etcd.restore

export ETCDCTL_API=3
etcdctl snapshot restore /var/lib/etcd_backup/backup_20191030210436.db
  --data-dir="/var/lib/etcd/etcd-0.etcd.restore" \
  --name  etcd-0 \
  --initial-cluster "etcd-0=https://192.168.10.133:2379,etcd-1=https://192.168.10.134:2379" \
--initial-cluster-token=etcd-cluster-0 \
  --initial-advertise-peer-urls http://192.168.10.133:2380

脚本:

set -x
export ETCD_NAME=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "name.*-name-[0-9].*--client"|awk '{print $2}')
export ETCD_CLUSTER=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "initial-cluster.*--initial"|awk '{print $2}')
export ETCD_INITIAL_CLUSTER_TOKEN=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "initial-cluster-token.*"|awk '{print $2}')
export ETCD_INITIAL_ADVERTISE_PEER_URLS=$(cat /usr/lib/systemd/system/etcd.service|grep ExecStart|grep -Eo "initial-advertise-peer-urls.*--listen-peer"|awk '{print $2}')
ETCDCTL_API=3 etcdctl snapshot --cacert=/var/lib/etcd/cert/ca.pem --cert=/var/lib/etcd/cert/etcd-client.pem --key=/var/lib/etcd/cert/etcd-client-key.pem  restore /var/lib/etcd_backup/backup_20180107172459.db \
  --name $ETCD_NAME \
  --data-dir /var/lib/etcd/data.etcd \
  --initial-cluster $ETCD_CLUSTER \
  --initial-cluster-token $ETCD_INITIAL_CLUSTER_TOKEN \
  --initial-advertise-peer-urls $ETCD_INITIAL_ADVERTISE_PEER_URLS
chown -R etcd:etcd /var/lib/etcd/data.etcd

在各个节点启动ETCD,并且通过service命令确认启动成功

# service etcd start
# service etcd status

检查集群状态

[root@k8s-master etcd]# ./etcdctl   --cacert=/etc/kubernetes/ssl/ca.pem --cert=/etc/kubernetes/ssl/kubernetes.pem --key=/etc/kubernetes/ssl/kubernetes-key.pem   --endpoints=https://192.168.10.133:2379,https://192.168.10.134:2379  endpoint health
https://192.168.10.133:2379 is healthy: successfully committed proposal: took = 14.251562ms
https://192.168.10.134:2379 is healthy: successfully committed proposal: took = 14.420846ms

如果ETCD是健康的,就到每台Master上恢复kube-apiserver

# mv /etc/kubernetes/manifests-backups/kube-apiserver.yaml /etc/kubernetes/manifests/

判断k8s 集群api-server 是否恢复正常

# kubectl get cs

总结

Kubernetes的备份主要是通过ETCD的备份完成的。而恢复时,主要考虑的是整个顺序:停止kube-apiserver,停止ETCD,恢复数据,启动ETCD,启动kube-apiserve

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 211,376评论 6 491
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,126评论 2 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 156,966评论 0 347
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,432评论 1 283
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,519评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,792评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,933评论 3 406
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,701评论 0 266
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,143评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,488评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,626评论 1 340
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,292评论 4 329
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,896评论 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,742评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 31,977评论 1 265
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,324评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,494评论 2 348

推荐阅读更多精彩内容