9 MySQL高可用
9.1 MHA
9.1.1 MHA介绍
MHA会对主节点进行监控, 可实现自动故障转移至其他从节点, 通过提升某一从节点(数据最新,最全的从节点)为新的主节点, 也可以指定优先级, 固定提升某个从节点为主节点, 基于主从复制实现, 还需要客户端配合实现, 目前MHA主要支持一主多从架构, 要搭建MHA, 要求一个复制集群中必须至少有三台数据库服务器, 一主二从, 即一台充当master, 一台充当备用master, 另一台充当从数据库
优点: MHA会自动切换主从, 并且支持一个MHA管理节点, 监控多个主从架构
缺点: 主从切换虽然是自动提升, 但是会有时间间隔, 间隔内产生的数据是会丢失的
前提: 主服务器宕机后, 二进制日志还可用, ssh服务器也可用. 只是mysql服务挂了而已.
工作过程: 由于各个从节点的同步进度不同, MHA会挑选出一个数据较新的节点, 提升为主节点, 把其他从节点的主节点信息, 切换到新主上
由于各个从节点的复制进度不同, 同时即使是数据较新的从节点, 也可能没来得及同步一部分主节点的二进制数据, 因此, 在提升从节点前, 会先确保所有从节点的二进制同步完全
如果某些从节点的中继日志还没完全写入数据库(一般是数据较旧的从节点), 那么这部分中继日志会先写入从节点数据库
之后其余从节点与较新的从节点的数据差, 会用较新的从节点的中继日志补全, 较新的从节点会把差的那部分中继日志发给其他从节点, 这样所有的从节点数据就都一致了
之后, 把master上还没来得及同步给从节点的二进制日志补全到所有从节点上. 该过程基于ssh协议实现, 因为只要主节点的系统没有崩溃, ssh服务可用, 就可以通过ssh协议, 把还没来得及同步到从节点的数据拷到从节点
提升较新的从节点为主节点
把其余从节点重新指向新的主节点
最终实现故障转移
- 如果主节点上的二进制日志已经不可用了, 那么未同步的数据就会丢失, 因此, MHA要配合半同步复制使用
MHA软件
https://github.com/yoshinorim/mha4mysql-manager/releases
MHA软件由两部分组成, Manager工具包装在MHA监控节点, Node节点工具包装在所有的主从服务器上和Manager监控节点上
管理端只支持CentOS 7, 被管理节点版本没要求
9.1.2 MHA实战案例
实验拓扑:
实现步骤:
1. 在mha管理节点安装manager和node包, manager包不支持CentOS8, 只支持CentOS7及以下版本
一定要先装node包, 如果先装manager包会出现perl依赖问题
yum -y install mha4mysql-node-0.58-0.el7.centos.noarch.rpm
yum -y install mha4mysql-manager-0.58-0.el7.centos.noarch.rpm
2. 在所有MySQL节点安装node包, node包支持所有版本
yum -y install mha4mysql-node-0.58-0.el7.centos.noarch.rpm
3. 在所有节点之间实现双向ssh基于key认证
[00:34:39 root@mha-manager ~]#ssh-keygen
[00:34:54 root@mha-manager ~]#ls .ssh
id_rsa id_rsa.pub known_hosts
[21:40:24 root@mha-manager ~]#ssh-copy-id 10.0.0.237
[00:34:55 root@mha-manager ~]#rsync -av .ssh 10.0.0.227:/root
[00:34:55 root@mha-manager ~]#rsync -av .ssh 10.0.0.217:/root
[00:34:55 root@mha-manager ~]#rsync -av .ssh 10.0.0.207:/root
4. 配置manager节点邮件报警服务
[16:57:56 root@mha-manager ~]#cat .mailrc
set from=953260716@qq.com
set smtp=smtp.qq.com
set smtp-auth-user=953260716@qq.com
set smtp-auth-password=dsmgjsbjcfsjbgaf
- 测试报警邮件能发送
[16:58:02 root@mha-manager ~]#mail -s hello 953260716@qq.com
test
.
EOT
- 准备MHA报警脚本
[16:59:09 root@mha-manager ~]#vim /usr/local/bin/sendmail.sh
echo "MySQL is down!!!" | mail -s "MHA Warning" 953260716@qq.com
[17:03:07 root@mha-manager ~]#chmod +x /usr/local/bin/sendmail.sh
- 准备自动切换vip脚本
[17:03:36 root@mha-manager ~]#cd /usr/local/bin
[17:05:56 root@mha-manager /usr/local/bin]#rz -E
rz waiting to receive.
[17:06:00 root@mha-manager /usr/local/bin]#ls
master_ip_failover sendmail.sh
[17:15:13 root@mha-manager ~]#vim /usr/local/bin/master_ip_failover
#!/usr/bin/env perl
use strict;
use warnings FATAL => 'all';
use Getopt::Long;
my (
$command, $ssh_user, $orig_master_host, $orig_master_ip,
$orig_master_port, $new_master_host, $new_master_ip, $new_master_port
);
my $vip = '10.0.0.100';#设置Virtual IP
my $gateway = '10.0.0.254';#网关Gateway IP
my $interface = 'eth0';
my $key = "1";
my $ssh_start_vip = "/sbin/ifconfig $interface:$key $vip;/sbin/arping -I $interface -c 3 -s $vip $gateway >/dev/null 2>&1";
my $ssh_stop_vip = "/sbin/ifconfig $interface:$key down";
GetOptions(
'command=s' => \$command,
'ssh_user=s' => \$ssh_user,
'orig_master_host=s' => \$orig_master_host,
'orig_master_ip=s' => \$orig_master_ip,
'orig_master_port=i' => \$orig_master_port,
'new_master_host=s' => \$new_master_host,
'new_master_ip=s' => \$new_master_ip,
'new_master_port=i' => \$new_master_port,
);
exit &main();
sub main {
print "\n\nIN SCRIPT TEST====$ssh_stop_vip==$ssh_start_vip===\n\n";
if ( $command eq "stop" || $command eq "stopssh" ) {
# $orig_master_host, $orig_master_ip, $orig_master_port are passed.
# If you manage master ip address at global catalog database,
# invalidate orig_master_ip here.
my $exit_code = 1;
eval {
print "Disabling the VIP on old master: $orig_master_host \n";
&stop_vip();
$exit_code = 0;
};
if ($@) {
warn "Got Error: $@\n";
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "start" ) {
# all arguments are passed.
# If you manage master ip address at global catalog database,
# activate new_master_ip here.
# You can also grant write access (create user, set read_only=0, etc) here.
my $exit_code = 10;
eval {
print "Enabling the VIP - $vip on the new master - $new_master_host \n";
&start_vip();
$exit_code = 0;
};
if ($@) {
warn $@;
exit $exit_code;
}
exit $exit_code;
}
elsif ( $command eq "status" ) {
print "Checking the Status of the script.. OK \n";
`ssh $ssh_user\@$orig_master_host \" $ssh_start_vip \"`;
exit 0;
}
else {
&usage();
exit 1;
}
}
# A simple system call that enable the VIP on the new master
sub start_vip() {
`ssh $ssh_user\@$new_master_host \" $ssh_start_vip \"`;
}
# A simple system call that disable the VIP on the old_master
sub stop_vip() {
`ssh $ssh_user\@$orig_master_host \" $ssh_stop_vip \"`;
}
sub usage {
print
"Usage: master_ip_failover --command=start|stop|stopssh|status --orig_master_host=host --orig_master_ip=ip --orig_master_port=port --new_master_host=host --new_master_ip=ip --new_master_port=port\n";
}
17:07:19 root@mha-manager /usr/local/bin]#chmod +x /usr/local/bin/sendmail.sh
[22:06:08 root@mha-manager ~]#chmod +x /usr/local/bin/master_ip_failover
- 在manger管理节点创建配置文件
[17:03:57 root@mha-manager ~]#mkdir /etc/mastermha
[17:04:04 root@mha-manager ~]#cd /etc/mastermha/
17:04:16 root@mha-manager /etc/mastermha]#ls
app1.cnf # 每个*.conf文件就代表mha监控的一套主从架构
[17:04:30 root@mha-manager /etc/mastermha]#vim app1.cnf
[server default]
user=mhauser # 用于远程连接MySQL所有节点的用户, 需要有管理员权限. 管理节点用这个账号连接到mysql被管理节点, 进行从节点提升. 主节点建好, 主从会同步到从节点
password=000000
manager_workdir=/data/mastermha/app1/ # 管理节点mha监控的主从架构的文件夹
manager_log=/data/mastermha/app1/manager.log # 管理节点mha监控的主从架构的mha日志文件
remote_workdir=/data/mastermha/app1/ #被管理节点mha主从架构的文件夹
ssh_user=root # 用于实现远程ssh基于key的连接, 访问二进制日志. 需要与之前key验证的用户一致
master_binlog_dir=/data/mysql # 指定主节点二进制文件存放目录
repl_user=repluser # 主从复制的用户信息
repl_password=000000
ping_interval=1 # 健康性ping检查的时间间隔
master_ip_failover_script=/usr/local/bin/master_ip_failover # 切换vip的perl脚本
# 用户写请求发送的主服务器, 但是主服务器可能故障, 虽然mha会提升新的从节点为主节点. 但是客户端程序依旧连接的是旧的master的ip地址.
# 切换vip脚本会绑定一个虚拟ip, 该ip只工作在可用的master节点, 主从切换后, 该ip会自动运行在新的主节点. 这样客户端程序只需要连接这个vip即可
report_script=/usr/local/bin/sendmail.sh # 执行报警脚本
check_repl_delay=0 # 默认如果slave中从库落后主库relay log 100M, mha不会选择这个从库为新的master, 因为这个从库进行恢复需要很长时间. 通过这个参数, mha触发主从切换的时候会忽略复制的延迟, 这个参数对于设置了candidate_master=1的从库非常有用, 这样确保这个从库一定能成为新的master
[server1]
hostname=10.0.0.227
[server2]
hostname=10.0.0.217
[server3]
hostname=10.0.0.207
candidate_master=1 # 设置为优先候选的master, 即使不是集群中事件最新的slave, 也会优先当选master. 实际工作中, 如果服务器配置都相同. 可以不加该选项
- 说明: 主库宕机, 谁来接管成为新的master
1. 所有从节点日志一致时, 默认会以配置文件的顺序去选择一个新主
2. 从节点日志不一致, 自动选择最接近于主库的从库当新主
3. 如果对于某节点设定了权重(candidate_master=1, 权重节点会优先选择. 但此节点日志量如果落后主库100M日志的话, 也不会被选择. 可以配合check_repl_delay=0, 关闭日志量的检查, 强制选择候选节点
- 实现master
[16:20:21 root@master ~]#vim /etc/my.cnf
[mysqld]
server-id=1
log-bin
general-log #非必须项, 建议开启, mha会不断地向主节点发起select 1;语句来判断主节点是否健康, 该日志会记录在通用日志里
datadir=/data/mysql
skip_name_resolve=1 #禁止ip到FQDN的反向解析, 防止通过ip访问时解析到域名出现问题
socket=/data/mysql/mysql.sock
log-error=/data/mysql/mysql.log
pid-file=/data/mysql/mysql.pid
[client]
socket=/data/mysql/mysql.sock
[21:05:32 root@master ~]#service mysqld restart
mysql> show master status;
+-----------------------+----------+--------------+------------------+-------------------+
| File | Position | Binlog_Do_DB | Binlog_Ignore_DB | Executed_Gtid_Set |
+-----------------------+----------+--------------+------------------+-------------------+
| centos-7-2-bin.000001 | 154 | | | |
+-----------------------+----------+--------------+------------------+-------------------+
1 row in set (0.00 sec)
grant replication slave on *.* to repluser@'10.0.0.%' identified by '000000';
grant all on *.* to mhauser@'10.0.0.%' identified by '000000';
配置vip
ifconfig eth0:1 10.0.0.100/24
管理节点测试vip连接
[21:44:38 root@mha-manager ~]#yum -y install mysql
[21:45:17 root@mha-manager ~]#mysql -umhauser -p000000 -h10.0.0.100
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MySQL connection id is 3
Server version: 5.7.31-log MySQL Community Server (GPL)
Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.
Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.
MySQL [(none)]>
# 能连接, 说明vip没问题
- 实现slave
10.0.0.217
[mysqld]
server-id=53 #各节点唯一
log-bin
read_only #从节点只读
relay_log_purge #不清理从节点中继日志, 中继日志默认会清理, 因为从节点提升为主, 会执行reset slave all命令, 把原本的中继日志删除, 生成新的空的中继日志. 如果中继日志丢了, 将来一旦提升为主节点, 其余从节点无法从起上复制数据
datadir=/data/mysql
skip_name_resolve=1
socket=/data/mysql/mysql.sock
log-error=/data/mysql/mysql.log
pid-file=/data/mysql/mysql.pid
[client]
socket=/data/mysql/mysql.sock
[21:47:21 root@slave-1 ~]#service mysqld restart
10.0.0.207
[mysqld]
server-id=52
log-bin
read_only
datadir=/data/mysql
skip_name_resolve=1
socket=/data/mysql/mysql.sock
log-error=/data/mysql/mysql.log
pid-file=/data/mysql/mysql.pid
[client]
socket=/data/mysql/mysql.sock
[21:48:59 root@slave-2 ~]#service mysqld restart
10.0.0.217
mysql> CHANGE MASTER TO
-> MASTER_HOST='10.0.0.227',
-> MASTER_USER='repluser',
-> MASTER_PASSWORD='000000',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='centos-7-2-bin.000001',
-> MASTER_LOG_POS=154;
Query OK, 0 rows affected, 2 warning
10.0.0.207
mysql> CHANGE MASTER TO
-> MASTER_HOST='10.0.0.227',
-> MASTER_USER='repluser',
-> MASTER_PASSWORD='000000',
-> MASTER_PORT=3306,
-> MASTER_LOG_FILE='centos-7-2-bin.000001',
-> MASTER_LOG_POS=154;
Query OK, 0 rows affected, 2 warning
10.0.0.217
start slave;
10.0.0.207
start slave;
- 检查mha环境
[22:02:01 root@mha-manager ~]#masterha_check_ssh --conf=/etc/mastermha/app1.cnf
Wed Jun 16 22:02:02 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 16 22:02:02 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:02:02 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:02:02 2021 - [info] Starting SSH connection tests..
Wed Jun 16 22:02:05 2021 - [debug]
Wed Jun 16 22:02:03 2021 - [debug] Connecting via SSH from root@10.0.0.207(10.0.0.207:22) to root@10.0.0.227(10.0.0.227:22)..
Wed Jun 16 22:02:03 2021 - [debug] ok.
Wed Jun 16 22:02:03 2021 - [debug] Connecting via SSH from root@10.0.0.207(10.0.0.207:22) to root@10.0.0.217(10.0.0.217:22)..
Wed Jun 16 22:02:04 2021 - [debug] ok.
Wed Jun 16 22:02:06 2021 - [debug]
Wed Jun 16 22:02:03 2021 - [debug] Connecting via SSH from root@10.0.0.217(10.0.0.217:22) to root@10.0.0.227(10.0.0.227:22)..
Wed Jun 16 22:02:04 2021 - [debug] ok.
Wed Jun 16 22:02:04 2021 - [debug] Connecting via SSH from root@10.0.0.217(10.0.0.217:22) to root@10.0.0.207(10.0.0.207:22)..
Wed Jun 16 22:02:05 2021 - [debug] ok.
Wed Jun 16 22:02:09 2021 - [debug]
Wed Jun 16 22:02:02 2021 - [debug] Connecting via SSH from root@10.0.0.227(10.0.0.227:22) to root@10.0.0.217(10.0.0.217:22)..
Wed Jun 16 22:02:07 2021 - [debug] ok.
Wed Jun 16 22:02:07 2021 - [debug] Connecting via SSH from root@10.0.0.227(10.0.0.227:22) to root@10.0.0.207(10.0.0.207:22)..
Wed Jun 16 22:02:08 2021 - [debug] ok.
Wed Jun 16 22:02:09 2021 - [info] All SSH connection tests passed successfully.
[22:06:29 root@mha-manager ~]#masterha_check_repl --conf=/etc/mastermha/app1.cnf
Wed Jun 16 22:06:31 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 16 22:06:31 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:06:31 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:06:31 2021 - [info] MHA::MasterMonitor version 0.58.
Wed Jun 16 22:06:33 2021 - [info] GTID failover mode = 0
Wed Jun 16 22:06:33 2021 - [info] Dead Servers:
Wed Jun 16 22:06:33 2021 - [info] Alive Servers:
Wed Jun 16 22:06:33 2021 - [info] 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:06:33 2021 - [info] 10.0.0.217(10.0.0.217:3306)
Wed Jun 16 22:06:33 2021 - [info] 10.0.0.207(10.0.0.207:3306)
Wed Jun 16 22:06:33 2021 - [info] Alive Slaves:
Wed Jun 16 22:06:33 2021 - [info] 10.0.0.217(10.0.0.217:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:06:33 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:06:33 2021 - [info] 10.0.0.207(10.0.0.207:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:06:33 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:06:33 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 16 22:06:33 2021 - [info] Current Alive Master: 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:06:33 2021 - [info] Checking slave configurations..
Wed Jun 16 22:06:33 2021 - [warning] relay_log_purge=0 is not set on slave 10.0.0.217(10.0.0.217:3306).
Wed Jun 16 22:06:33 2021 - [warning] relay_log_purge=0 is not set on slave 10.0.0.207(10.0.0.207:3306).
Wed Jun 16 22:06:33 2021 - [info] Checking replication filtering settings..
Wed Jun 16 22:06:33 2021 - [info] binlog_do_db= , binlog_ignore_db=
Wed Jun 16 22:06:33 2021 - [info] Replication filtering check ok.
Wed Jun 16 22:06:33 2021 - [info] GTID (with auto-pos) is not supported
Wed Jun 16 22:06:33 2021 - [info] Starting SSH connection tests..
Wed Jun 16 22:06:35 2021 - [info] All SSH connection tests passed successfully.
Wed Jun 16 22:06:35 2021 - [info] Checking MHA Node version..
Wed Jun 16 22:06:35 2021 - [info] Version check ok.
Wed Jun 16 22:06:35 2021 - [info] Checking SSH publickey authentication settings on the current master..
Wed Jun 16 22:06:40 2021 - [warning] HealthCheck: Got timeout on checking SSH connection to 10.0.0.227! at /usr/share/perl5/vendor_perl/MHA/HealthCheck.pm line 343.
Wed Jun 16 22:06:40 2021 - [info] Checking SSH publickey authentication and checking recovery script configurations on all alive slave servers..
Wed Jun 16 22:06:40 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=10.0.0.217 --slave_ip=10.0.0.217 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.7.31-log --manager_version=0.58 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Wed Jun 16 22:06:40 2021 - [info] Connecting to root@10.0.0.217(10.0.0.217:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql, up to slave-1-relay-bin.000002
Temporary relay log file is /data/mysql/slave-1-relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 16 22:06:41 2021 - [info] Executing command : apply_diff_relay_logs --command=test --slave_user='mhauser' --slave_host=10.0.0.207 --slave_ip=10.0.0.207 --slave_port=3306 --workdir=/data/mastermha/app1/ --target_version=5.7.31-log --manager_version=0.58 --relay_log_info=/data/mysql/relay-log.info --relay_dir=/data/mysql/ --slave_pass=xxx
Wed Jun 16 22:06:41 2021 - [info] Connecting to root@10.0.0.207(10.0.0.207:22)..
Checking slave recovery environment settings..
Opening /data/mysql/relay-log.info ... ok.
Relay log found at /data/mysql, up to slave-2-relay-bin.000002
Temporary relay log file is /data/mysql/slave-2-relay-bin.000002
Checking if super_read_only is defined and turned on.. not present or turned off, ignoring.
Testing mysql connection and privileges..
mysql: [Warning] Using a password on the command line interface can be insecure.
done.
Testing mysqlbinlog output.. done.
Cleaning up test file(s).. done.
Wed Jun 16 22:06:41 2021 - [info] Slaves settings check done.
Wed Jun 16 22:06:41 2021 - [info]
10.0.0.227(10.0.0.227:3306) (current master)
+--10.0.0.217(10.0.0.217:3306)
+--10.0.0.207(10.0.0.207:3306)
Wed Jun 16 22:06:41 2021 - [info] Checking replication health on 10.0.0.217..
Wed Jun 16 22:06:41 2021 - [info] ok.
Wed Jun 16 22:06:41 2021 - [info] Checking replication health on 10.0.0.207..
Wed Jun 16 22:06:41 2021 - [info] ok.
Wed Jun 16 22:06:41 2021 - [info] Checking master_ip_failover_script status:
Wed Jun 16 22:06:41 2021 - [info] /usr/local/bin/master_ip_failover --command=status --ssh_user=root --orig_master_host=10.0.0.227 --orig_master_ip=10.0.0.227 --orig_master_port=3306
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.0.100;/sbin/arping -I eth0 -c 3 -s 10.0.0.100 10.0.0.254 >/dev/null 2>&1===
Checking the Status of the script.. OK
Wed Jun 16 22:06:44 2021 - [info] OK.
Wed Jun 16 22:06:44 2021 - [warning] shutdown_script is not defined.
Wed Jun 16 22:06:44 2021 - [info] Got exit code 0 (Not master dead).
MySQL Replication Health is OK.
- 启动mha
#开启mha, 默认是前台执行
[22:08:51 root@mha-manager ~]#masterha_manager --conf=/etc/mastermha/app1.cnf
Wed Jun 16 22:10:55 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 16 22:10:55 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:10:55 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
#如需后台执行, 用nohup
nohup masterha_manager --conf=/etc/mastermha/app1.cnf &>/dev/null &
#查看状态
masterha_check_status --conf=/etc/mastermha/app1.cnf
- 启动后查看日志
[22:14:12 root@mha-manager ~]#tail -f /data/mastermha/app1/manager.log
...
...
Wed Jun 16 22:14:49 2021 - [info] Set master ping interval 1 seconds.
Wed Jun 16 22:14:49 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Jun 16 22:14:49 2021 - [info] Starting ping health check on 10.0.0.227(10.0.0.227:3306)..
Wed Jun 16 22:14:49 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
- 查看状态
[22:15:40 root@mha-manager ~]#masterha_check_status --conf=/etc/mastermha/app1.cnf
app1 (pid:27179) is running(0:PING_OK), master:10.0.0.227
# 227为主节点
- 主节点通用日志
[22:04:44 root@master ~]#tail -f /data/mysql/centos-7-2.log
2021-06-16T14:16:16.093829Z 23 Query SELECT 1 As Value
2021-06-16T14:16:17.098234Z 23 Query SELECT 1 As Value
2021-06-16T14:16:18.099751Z 23 Query SELECT 1 As Value
2021-06-16T14:16:19.102544Z 23 Query SELECT 1 As Value
2021-06-16T14:16:20.103765Z 23 Query SELECT 1 As Value
2021-06-16T14:16:21.105595Z 23 Query SELECT 1 As Value
2021-06-16T14:16:22.106475Z 23 Query SELECT 1 As Value
2021-06-16T14:16:23.109136Z 23 Query SELECT 1 As Value
2021-06-16T14:16:24.110503Z 23 Query SELECT 1 As Value
2021-06-16T14:16:25.111729Z 23 Query SELECT 1 As Value
2021-06-16T14:16:26.111963Z 23 Query SELECT 1 As Value
2021-06-16T14:16:27.112226Z 23 Query SELECT 1 As Value
# 每秒执行一次select 1进行检测. 不过只会对主节点进行检测, 从节点不检测
- 测试数据同步
主节点导入hellodb
[22:19:43 root@master ~]#mysql < hellodb_innodb.sql
验证从节点同步数据即可
- 测试故障转移
# 主节点导入testlog.sql
mysql> use hellodb;
mysql> call sp_testlog;
# 之后停止主节点mysqld服务
[22:23:52 root@master ~]#service mysqld stop
Shutting down MySQL............ SUCCESS!
mha程序会立即退出
[22:14:36 root@mha-manager ~]#masterha_manager --conf=/etc/mastermha/app1.cnf
Wed Jun 16 22:14:41 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 16 22:14:41 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:14:41 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Creating /data/mastermha/app1 if not exists.. ok.
Checking output directory is accessible or not..
ok.
Binlog found at /data/mysql, up to centos-7-2-bin.000001
Wed Jun 16 22:24:13 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 16 22:24:13 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:24:13 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
观察管理节点日志
Checking the Status of the script.. OK
Wed Jun 16 22:14:49 2021 - [info] OK.
Wed Jun 16 22:14:49 2021 - [warning] shutdown_script is not defined.
Wed Jun 16 22:14:49 2021 - [info] Set master ping interval 1 seconds.
Wed Jun 16 22:14:49 2021 - [warning] secondary_check_script is not defined. It is highly recommended setting it to check master reachability from two or more routes.
Wed Jun 16 22:14:49 2021 - [info] Starting ping health check on 10.0.0.227(10.0.0.227:3306)..
Wed Jun 16 22:14:49 2021 - [info] Ping(SELECT) succeeded, waiting until MySQL doesn't respond..
--以下是主从切换日志--
Wed Jun 16 22:24:09 2021 - [warning] Got error on MySQL select ping: 2006 (MySQL server has gone away)
Wed Jun 16 22:24:09 2021 - [info] Executing SSH check script: save_binary_logs --command=test --start_pos=4 --binlog_dir=/data/mysql --output_file=/data/mastermha/app1//save_binary_logs_test --manager_version=0.58 --binlog_prefix=centos-7-2-bin
Wed Jun 16 22:24:10 2021 - [info] HealthCheck: SSH to 10.0.0.227 is reachable.
Wed Jun 16 22:24:10 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.227' (111))
Wed Jun 16 22:24:10 2021 - [warning] Connection failed 2 time(s)..
Wed Jun 16 22:24:11 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.227' (111))
Wed Jun 16 22:24:11 2021 - [warning] Connection failed 3 time(s)..
Wed Jun 16 22:24:12 2021 - [warning] Got error on MySQL connect: 2003 (Can't connect to MySQL server on '10.0.0.227' (111))
Wed Jun 16 22:24:12 2021 - [warning] Connection failed 4 time(s)..
Wed Jun 16 22:24:12 2021 - [warning] Master is not reachable from health checker!
Wed Jun 16 22:24:12 2021 - [warning] Master 10.0.0.227(10.0.0.227:3306) is not reachable!
Wed Jun 16 22:24:12 2021 - [warning] SSH is reachable.
Wed Jun 16 22:24:12 2021 - [info] Connecting to a master server failed. Reading configuration file /etc/masterha_default.cnf and /etc/mastermha/app1.cnf again, and trying to connect to all servers to check server status..
Wed Jun 16 22:24:12 2021 - [warning] Global configuration file /etc/masterha_default.cnf not found. Skipping.
Wed Jun 16 22:24:12 2021 - [info] Reading application default configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:24:12 2021 - [info] Reading server configuration from /etc/mastermha/app1.cnf..
Wed Jun 16 22:24:13 2021 - [info] GTID failover mode = 0
Wed Jun 16 22:24:13 2021 - [info] Dead Servers:
Wed Jun 16 22:24:13 2021 - [info] 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:13 2021 - [info] Alive Servers:
Wed Jun 16 22:24:13 2021 - [info] 10.0.0.217(10.0.0.217:3306)
Wed Jun 16 22:24:13 2021 - [info] 10.0.0.207(10.0.0.207:3306)
Wed Jun 16 22:24:13 2021 - [info] Alive Slaves:
Wed Jun 16 22:24:13 2021 - [info] 10.0.0.217(10.0.0.217:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:13 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:13 2021 - [info] 10.0.0.207(10.0.0.207:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:13 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:13 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 16 22:24:13 2021 - [info] Checking slave configurations..
Wed Jun 16 22:24:13 2021 - [warning] relay_log_purge=0 is not set on slave 10.0.0.217(10.0.0.217:3306).
Wed Jun 16 22:24:13 2021 - [warning] relay_log_purge=0 is not set on slave 10.0.0.207(10.0.0.207:3306).
Wed Jun 16 22:24:13 2021 - [info] Checking replication filtering settings..
Wed Jun 16 22:24:13 2021 - [info] Replication filtering check ok.
Wed Jun 16 22:24:13 2021 - [info] Master is down!
Wed Jun 16 22:24:13 2021 - [info] Terminating monitoring script.
Wed Jun 16 22:24:13 2021 - [info] Got exit code 20 (Master dead).
Wed Jun 16 22:24:14 2021 - [info] MHA::MasterFailover version 0.58.
Wed Jun 16 22:24:14 2021 - [info] Starting master failover.
Wed Jun 16 22:24:14 2021 - [info]
Wed Jun 16 22:24:14 2021 - [info] * Phase 1: Configuration Check Phase..
Wed Jun 16 22:24:14 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] GTID failover mode = 0
Wed Jun 16 22:24:15 2021 - [info] Dead Servers:
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] Checking master reachability via MySQL(double check)...
Wed Jun 16 22:24:15 2021 - [info] ok.
Wed Jun 16 22:24:15 2021 - [info] Alive Servers:
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.217(10.0.0.217:3306)
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.207(10.0.0.207:3306)
Wed Jun 16 22:24:15 2021 - [info] Alive Slaves:
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.217(10.0.0.217:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.207(10.0.0.207:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 16 22:24:15 2021 - [info] Starting Non-GTID based failover.
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] ** Phase 1: Configuration Check Phase completed.
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 2: Dead Master Shutdown Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] Forcing shutdown so that applications never connect to the current master..
Wed Jun 16 22:24:15 2021 - [info] Executing master IP deactivation script:
Wed Jun 16 22:24:15 2021 - [info] /usr/local/bin/master_ip_failover --orig_master_host=10.0.0.227 --orig_master_ip=10.0.0.227 --orig_master_port=3306 --command=stopssh --ssh_user=root
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.0.100;/sbin/arping -I eth0 -c 3 -s 10.0.0.100 10.0.0.254 >/dev/null 2>&1===
Disabling the VIP on old master: 10.0.0.227
Wed Jun 16 22:24:15 2021 - [info] done.
Wed Jun 16 22:24:15 2021 - [warning] shutdown_script is not set. Skipping explicit shutting down of the dead master.
Wed Jun 16 22:24:15 2021 - [info] * Phase 2: Dead Master Shutdown Phase completed.
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 3: Master Recovery Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 3.1: Getting Latest Slaves Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] The latest binary log file/position on all slaves is centos-7-2-bin.000001:23550107
Wed Jun 16 22:24:15 2021 - [info] Latest slaves (Slaves that received relay log files to the latest):
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.217(10.0.0.217:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.207(10.0.0.207:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 16 22:24:15 2021 - [info] The oldest binary log file/position on all slaves is centos-7-2-bin.000001:23550107
Wed Jun 16 22:24:15 2021 - [info] Oldest slaves:
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.217(10.0.0.217:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.207(10.0.0.207:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 3.2: Saving Dead Master's Binlog Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] Fetching dead master's binary logs..
Wed Jun 16 22:24:15 2021 - [info] Executing command on the dead master 10.0.0.227(10.0.0.227:3306): save_binary_logs --command=save --start_file=centos-7-2-bin.000001 --start_pos=23550107 --binlog_dir=/data/mysql --output_file=/data/mastermha/app1//saved_master_binlog_from_10.0.0.227_3306_20210616222413.binlog --handle_raw_binlog=1 --disable_log_bin=0 --manager_version=0.58
Creating /data/mastermha/app1 if not exists.. ok.
Concat binary/relay logs from centos-7-2-bin.000001 pos 23550107 to centos-7-2-bin.000001 EOF into /data/mastermha/app1//saved_master_binlog_from_10.0.0.227_3306_20210616222413.binlog ..
Binlog Checksum enabled
Dumping binlog format description event, from position 0 to 154.. ok.
No need to dump effective binlog data from /data/mysql/centos-7-2-bin.000001 (pos starts 23550107, filesize 23550107). Skipping.
Binlog Checksum enabled
/data/mastermha/app1//saved_master_binlog_from_10.0.0.227_3306_20210616222413.binlog has no effective data events.
Event not exists.
Wed Jun 16 22:24:15 2021 - [info] Additional events were not found from the orig master. No need to save.
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 3.3: Determining New Master Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] Finding the latest slave that has all relay logs for recovering other slaves..
Wed Jun 16 22:24:15 2021 - [info] All slaves received relay logs to the same position. No need to resync each other.
Wed Jun 16 22:24:15 2021 - [info] Searching new master from slaves..
Wed Jun 16 22:24:15 2021 - [info] Candidate masters from the configuration file:
Wed Jun 16 22:24:15 2021 - [info] 10.0.0.207(10.0.0.207:3306) Version=5.7.31-log (oldest major version between slaves) log-bin:enabled
Wed Jun 16 22:24:15 2021 - [info] Replicating from 10.0.0.227(10.0.0.227:3306)
Wed Jun 16 22:24:15 2021 - [info] Primary candidate for the new Master (candidate_master is set)
Wed Jun 16 22:24:15 2021 - [info] Non-candidate masters:
Wed Jun 16 22:24:15 2021 - [info] Searching from candidate_master slaves which have received the latest relay log events..
Wed Jun 16 22:24:15 2021 - [info] New master is 10.0.0.207(10.0.0.207:3306)
Wed Jun 16 22:24:15 2021 - [info] Starting master failover..
Wed Jun 16 22:24:15 2021 - [info]
From:
10.0.0.227(10.0.0.227:3306) (current master)
+--10.0.0.217(10.0.0.217:3306)
+--10.0.0.207(10.0.0.207:3306)
To:
10.0.0.207(10.0.0.207:3306) (new master)
+--10.0.0.217(10.0.0.217:3306)
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 3.4: New Master Diff Log Generation Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] * Phase 3.5: Master Log Apply Phase..
Wed Jun 16 22:24:15 2021 - [info]
Wed Jun 16 22:24:15 2021 - [info] *NOTICE: If any error happens from this phase, manual recovery is needed.
Wed Jun 16 22:24:15 2021 - [info] Starting recovery on 10.0.0.207(10.0.0.207:3306)..
Wed Jun 16 22:24:15 2021 - [info] This server has all relay logs. Waiting all logs to be applied..
Wed Jun 16 22:24:15 2021 - [info] done.
Wed Jun 16 22:24:15 2021 - [info] All relay logs were successfully applied.
Wed Jun 16 22:24:15 2021 - [info] Getting new master's binlog name and position..
Wed Jun 16 22:24:15 2021 - [info] slave-2-bin.000002:154
Wed Jun 16 22:24:15 2021 - [info] All other slaves should start replication from here. Statement should be: CHANGE MASTER TO MASTER_HOST='10.0.0.207', MASTER_PORT=3306, MASTER_LOG_FILE='slave-2-bin.000002', MASTER_LOG_POS=154, MASTER_USER='repluser', MASTER_PASSWORD='xxx';
Wed Jun 16 22:24:15 2021 - [info] Executing master IP activate script:
Wed Jun 16 22:24:15 2021 - [info] /usr/local/bin/master_ip_failover --command=start --ssh_user=root --orig_master_host=10.0.0.227 --orig_master_ip=10.0.0.227 --orig_master_port=3306 --new_master_host=10.0.0.207 --new_master_ip=10.0.0.207 --new_master_port=3306 --new_master_user='mhauser' --new_master_password=xxx
Unknown option: new_master_user
Unknown option: new_master_password
IN SCRIPT TEST====/sbin/ifconfig eth0:1 down==/sbin/ifconfig eth0:1 10.0.0.100;/sbin/arping -I eth0 -c 3 -s 10.0.0.100 10.0.0.254 >/dev/null 2>&1===
Enabling the VIP - 10.0.0.100 on the new master - 10.0.0.207
Wed Jun 16 22:24:18 2021 - [info] OK.
Wed Jun 16 22:24:18 2021 - [info] Setting read_only=0 on 10.0.0.207(10.0.0.207:3306)..
Wed Jun 16 22:24:18 2021 - [info] ok.
Wed Jun 16 22:24:18 2021 - [info] ** Finished master recovery successfully.
Wed Jun 16 22:24:18 2021 - [info] * Phase 3: Master Recovery Phase completed.
Wed Jun 16 22:24:18 2021 - [info]
Wed Jun 16 22:24:18 2021 - [info] * Phase 4: Slaves Recovery Phase..
Wed Jun 16 22:24:18 2021 - [info]
Wed Jun 16 22:24:18 2021 - [info] * Phase 4.1: Starting Parallel Slave Diff Log Generation Phase..
Wed Jun 16 22:24:18 2021 - [info]
Wed Jun 16 22:24:18 2021 - [info] -- Slave diff file generation on host 10.0.0.217(10.0.0.217:3306) started, pid: 27827. Check tmp log /data/mastermha/app1//10.0.0.217_3306_20210616222413.log if it takes time..
Wed Jun 16 22:24:19 2021 - [info]
Wed Jun 16 22:24:19 2021 - [info] Log messages from 10.0.0.217 ...
Wed Jun 16 22:24:19 2021 - [info]
Wed Jun 16 22:24:18 2021 - [info] This server has all relay logs. No need to generate diff files from the latest slave.
Wed Jun 16 22:24:19 2021 - [info] End of log messages from 10.0.0.217.
Wed Jun 16 22:24:19 2021 - [info] -- 10.0.0.217(10.0.0.217:3306) has the latest relay log events.
Wed Jun 16 22:24:19 2021 - [info] Generating relay diff files from the latest slave succeeded.
Wed Jun 16 22:24:19 2021 - [info]
Wed Jun 16 22:24:19 2021 - [info] * Phase 4.2: Starting Parallel Slave Log Apply Phase..
Wed Jun 16 22:24:19 2021 - [info]
Wed Jun 16 22:24:19 2021 - [info] -- Slave recovery on host 10.0.0.217(10.0.0.217:3306) started, pid: 27829. Check tmp log /data/mastermha/app1//10.0.0.217_3306_20210616222413.log if it takes time..
Wed Jun 16 22:24:20 2021 - [info]
Wed Jun 16 22:24:20 2021 - [info] Log messages from 10.0.0.217 ...
Wed Jun 16 22:24:20 2021 - [info]
Wed Jun 16 22:24:19 2021 - [info] Starting recovery on 10.0.0.217(10.0.0.217:3306)..
Wed Jun 16 22:24:19 2021 - [info] This server has all relay logs. Waiting all logs to be applied..
Wed Jun 16 22:24:19 2021 - [info] done.
Wed Jun 16 22:24:19 2021 - [info] All relay logs were successfully applied.
Wed Jun 16 22:24:19 2021 - [info] Resetting slave 10.0.0.217(10.0.0.217:3306) and starting replication from the new master 10.0.0.207(10.0.0.207:3306)..
Wed Jun 16 22:24:19 2021 - [info] Executed CHANGE MASTER.
Wed Jun 16 22:24:19 2021 - [info] Slave started.
Wed Jun 16 22:24:20 2021 - [info] End of log messages from 10.0.0.217.
Wed Jun 16 22:24:20 2021 - [info] -- Slave recovery on host 10.0.0.217(10.0.0.217:3306) succeeded.
Wed Jun 16 22:24:20 2021 - [info] All new slave servers recovered successfully.
Wed Jun 16 22:24:20 2021 - [info]
Wed Jun 16 22:24:20 2021 - [info] * Phase 5: New master cleanup phase..
Wed Jun 16 22:24:20 2021 - [info]
Wed Jun 16 22:24:20 2021 - [info] Resetting slave info on the new master..
Wed Jun 16 22:24:20 2021 - [info] 10.0.0.207: Resetting slave info succeeded.
Wed Jun 16 22:24:20 2021 - [info] Master failover to 10.0.0.207(10.0.0.207:3306) completed successfully.
Wed Jun 16 22:24:20 2021 - [info]
----- Failover Report -----
app1: MySQL Master failover 10.0.0.227(10.0.0.227:3306) to 10.0.0.207(10.0.0.207:3306) succeeded
Master 10.0.0.227(10.0.0.227:3306) is down!
Check MHA Manager logs at mha-manager:/data/mastermha/app1/manager.log for details.
Started automated(non-interactive) failover.
Invalidated master IP address on 10.0.0.227(10.0.0.227:3306)
The latest slave 10.0.0.217(10.0.0.217:3306) has all relay logs for recovery.
# 这里显示10.0.0.217有所有的中继日志, 但是我们规定了207的优先级高, 因此, 还会选择207为新的主节点.
Selected 10.0.0.207(10.0.0.207:3306) as a new master.
10.0.0.207(10.0.0.207:3306): OK: Applying all logs succeeded.
10.0.0.207(10.0.0.207:3306): OK: Activated master IP address.
10.0.0.217(10.0.0.217:3306): This host has the latest relay log events.
Generating relay diff files from the latest slave succeeded. # 虽然207的日志没有217新, 但是217会把多的日志复制给207, 最后数据都是一样的
10.0.0.217(10.0.0.217:3306): OK: Applying all logs succeeded. Slave started, replicating from 10.0.0.207(10.0.0.207:3306)
10.0.0.207(10.0.0.207:3306): Resetting slave info succeeded.
Master failover to 10.0.0.207(10.0.0.207:3306) completed successfully.
Wed Jun 16 22:24:20 2021 - [info] Sending mail..
10.0.0.217通用日志查看主节点修改
2021-06-16T14:24:19.846924Z 8 Query STOP SLAVE
2021-06-16T14:24:19.847139Z 8 Query SHOW SLAVE STATUS
2021-06-16T14:24:19.847762Z 8 Query RESET SLAVE
2021-06-16T14:24:19.852183Z 8 Query CHANGE MASTER TO MASTER_HOST = '10.0.0.207' MASTER_USER = 'repluser' MASTER_PASSWORD = <secret> MASTER_PORT = 3306 MASTER_LOG_FILE = 'slave-2-bin.000002' MASTER_LOG_POS = 154
2021-06-16T14:24:19.855042Z 8 Query START SLAVE
10.0.0.207上查看slave
mysql> show slave status\G
Empty set (0.00 sec)
告警邮件
查看vip修改
故障主节点10.0.0.227的vip会被删除
:24:20 root@mha-manager ~]#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:54:b7:78 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.237/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe54:b778/64 scope link
valid_lft forever preferred_lft forever
vip转移到新的master 10.0.0.217
[22:40:18 root@slave-2 ~]#ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 00:0c:29:54:b7:92 brd ff:ff:ff:ff:ff:ff
inet 10.0.0.207/24 brd 10.0.0.255 scope global noprefixroute eth0
valid_lft forever preferred_lft forever
inet 10.0.0.100/8 brd 10.255.255.255 scope global eth0:1
valid_lft forever preferred_lft forever
inet6 fe80::20c:29ff:fe54:b792/64 scope link
valid_lft forever preferred_lft forever
# 对于客户端程序来说, 始终访问100地址即可
检查输入写入情况
10.0.0.217
mysql> select count(*) from testlog;
+----------+
| count(*) |
+----------+
| 83511 |
+----------+
1 row in set (0.01 sec)
10.0.0.207
mysql> select count(*) from testlog;
+----------+
| count(*) |
+----------+
| 83511 |
+----------+
1 row in set (0.01 sec)
# 可以看到新的主从数据一致
恢复旧的master, 查看数据写入
22:24:20 root@master ~]#service mysqld start
mysql> select count(*) from testlog;
+----------+
| count(*) |
+----------+
| 83511 | # 数据一致
+----------+
1 row in set (0.01 sec)
不过旧的master需要手动重新配置为从节点, 加入到新的主从
mysql> show slave status\G
Empty set (0.00 sec)
注意: 新的主提升后, 需要马上把新主的read-only选项去掉, 因为新主是原先的从节点提升的. MHA会自动把新主上的read_only变量改为0, 但是不会修改配置文件. 因此, 新主提升后, 一定要把配置文件中的read-only注释掉, 或者删除, 否则一旦新主服务重启, 那么除了root用户, 普通用户是无法写入数据的
#read_only
MHA缺点:
1. 一旦主节点故障, mha程序就会退出, 之后还需要手动重新启动
2. MHA要求至少是一主两从运行, 因此,如果是一主两从的架构, 一旦主节点宕机, 故障转移后, 就只有一个从节点了, 此时再次运行MHA是不会起作用的. 需要手动把故障的主节点重新添加到主从, 再启动mha.
3. 此外, 一旦mha退出, 剩下的一主一从架构, 一旦主节点再故障, 就没法转移了, 如果客户端读写操作都是通过vip, 那么读写都将无法执行
4. MHA切换会有大概10秒的时间, 该时间内是无法写入数据的
Wed Jun 16 22:52:02 2021 - [info] MHA::MasterMonitor version 0.58.
Wed Jun 16 22:52:03 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln653] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Wed Jun 16 22:52:03 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Wed Jun 16 22:52:03 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Wed Jun 16 22:52:03 2021 - [info] Got exit code 1 (Not master dead).
Wed Jun 16 22:52:16 2021 - [info] MHA::MasterMonitor version 0.58.
Wed Jun 16 22:52:17 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln653] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Wed Jun 16 22:52:17 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Wed Jun 16 22:52:17 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Wed Jun 16 22:52:17 2021 - [info] Got exit code 1 (Not master dead).
Wed Jun 16 22:52:26 2021 - [info] MHA::MasterMonitor version 0.58.
Wed Jun 16 22:52:27 2021 - [error][/usr/share/perl5/vendor_perl/MHA/ServerManager.pm, ln653] There are 2 non-slave servers! MHA manages at most one non-slave server. Check configurations.
Wed Jun 16 22:52:27 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln427] Error happened on checking configurations. at /usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm line 329.
Wed Jun 16 22:52:27 2021 - [error][/usr/share/perl5/vendor_perl/MHA/MasterMonitor.pm, ln525] Error happened on monitoring servers.
Wed Jun 16 22:52:27 2021 - [info] Got exit code 1 (Not master dead).