基础组件容器化
前一段时间容器化了容器的s3和pika。由于已经有开源方案,本次mysql直接以operator容器化。使用的是[presslabs的mysql-operator]https://github.com/presslabs/mysql-operator。
主要特征
- presslabs/mysql-operator自动化搭建主从集群。
- 使用XtraBackup 对mysql数据进行备份和恢复
- 使用orchestrator保证集群高可用,自动发现MySQL的复制拓扑,自动选主。
- 可以将数据备份到s3,并支持从s3读取数据恢复集群。
安装
刚提的pr修复myql-operator 的mysql-8分支,需要下载源码进行安装
- 下载myql-operator源码
git clone -b mysql-8 https://github.com/presslabs/mysql-operator.git
- myql-operator支持mysql8的bug最近刚刚修复,目前不在master分支,也没有正式发布。所以需要自己编译。镜像然后修chart的value.yaml文件
# Default values for mysql-operator.
# This is a YAML-formatted file.
# Declare variables to be passed into your templates.
replicas: 1
image: 自己编译的mysql-operator镜像
sidecarImage: 自己编译的mysql-operator-sidecar镜像
imagePullPolicy: IfNotPresent
imagePullSecrets:
# - name: "image-pull-secret"
# whether or not to install CRDs
installCRDs: true
# in which namespace to watch for resource, leave empty to watch in all namespaces
watchNamespace:
# The operator will install a ServiceMonitor if you have prometheus-operator installed.
serviceMonitor:
- 使用本地chart安装mysql operator
[root@rg1-ceph101 /home/zhangzhifei/mysql-operator/charts/mysql-operator]# helm install .
- 编辑cluster.yaml
apiVersion: mysql.presslabs.org/v1alpha1
kind: MysqlCluster
metadata:
name: mysql-cluster-test
spec:
replicas: 3
secretName: my-secret
mysqlVersion: 8.0.20 #
volumeSpec:
persistentVolumeClaim:
accessModes: ["ReadWriteOnce"]
resources:
requests:
storage: 20Gi
#mysqlConf:
# query-cache-type: "0"
# query-cache-size: "0"
podSpec:
resources:
requests:
cpu: "8"
memory: 16Gi
limits:
cpu: "8"
memory: 16Gi
mysqlConf:
innodb-buffer-pool-size: 6Gi
---
apiVersion: v1
kind: Secret
metadata:
name: my-secret
type: Opaque
data:
# root password is required to be specified (root/mypass)
ROOT_PASSWORD: bXlwYXNz
## application credentials that will be created at cluster bootstrap
# DATABASE:
# USER:
# PASSWORD:
- 创建mysql集群
kubectl apply -f cluster.yaml
工作流程
- myql cluster cr 创建后operator会创建mysql statefulset,由statefulset负责创建并管理pod。
- pod里的init容器负责从健康的节点备份数据,并执行prepare backup。如果是master节点第一次启动不需要备份数据,但是可能是从s3获取备份数据恢复mysql集群。
- heartbeat容器负责发送心跳
- sidecar容器是一个mysql数据备份和传输的容器,第一个从节点向master pod的sidecar容器发从backup请求,master pod的sidecar容器先执行“xtrabackup --backup”,然后把备份数据传给从pod。
- mysql容器启动后,node controller会感知到,然后进行change master等操作。到此集群已经跑起来了
- orch controller会将mysql节点加入到orchestrator中管理,并且通过orchestrator服务获取集群的健康状态,更新mysqlcluster资源对象
- orchestrator服务负责故障恢复(master 挂了之后,从新选主)。
问题
当前的xtrabackup-8.0可能有些bug,通过--tables-exclude参数过滤掉sys_operator.status表,但是数据仍然有残留导致从节点启动时报如下错误:
2019-07-04T18:07:07.992788Z 1 [Warning] [MY-012351] [InnoDB] Tablespace 2, name 'sys_operator/status', file './sys_operator/status.ibd' is missing!
2019-07-04T18:07:08.563943Z 6 [ERROR] [MY-012592] [InnoDB] Operating system error number 2 in a file operation.
2019-07-04T18:07:08.563966Z 6 [ERROR] [MY-012593] [InnoDB] The error means the system cannot find the path specified.
2019-07-04T18:07:08.563974Z 6 [ERROR] [MY-012216] [InnoDB] Cannot open datafile for read-only: './sys_operator/status.ibd' OS error: 71
2019-07-04T18:07:08.564165Z 6 [ERROR] [MY-000061] [Server] 1812 Tablespace is missing for table `sys_operator`.`status`..
最终通过修改mysql初始化脚本修复这个问题:判断如果不是master节点则先清理status表,然后在创建。(status表是operator检测mysql是否启动成功用的)