orchestrator+vip实现高可用

环境

系统：CentOS 7.6
服务器
192.168.20.101 node1
192.168.20.102 node2
192.168.20.103 node3
数据库版本： MySQL 5.7.27 均已安装
GTID 已开启
log-slave-updates = ON
report_host最好配置上
orchestrator 版本： v3.1.4

MySQL主从配置略...，需注意设置主从时添加参数如下，优化发现切换时间

change master to 
master_host='192.168.20.101',
master_port=3306,
master_user='rpl',
master_password='123456',
master_auto_position=1,
MASTER_HEARTBEAT_PERIOD=2,
MASTER_CONNECT_RETRY=1,
MASTER_RETRY_COUNT=86400;

set global slave_net_timeout=8;
set global read_only=1;
set global super_read_only=1;

1-基础配置

1.1-hosts配置

每台机器分别添加hosts

echo '192.168.20.101 node1' >> /etc/hosts
echo '192.168.20.102 node2' >> /etc/hosts
echo '192.168.20.103 node3' >> /etc/hosts

1.2-配置ssh免密

所有机器执行:

# 一路回车
ssh-keygen
ssh-copy-id node1
ssh-copy-id node2
ssh-copy-id node3

2-部署 orchestrator

2.1-下载

下载路径：https://github.com/openark/orchestrator/releases

wget https://github.com/openark/orchestrator/releases/download/v3.1.4/orchestrator-3.1.4-1.x86_64.rpm
wget https://github.com/openark/orchestrator/releases/download/v3.1.4/orchestrator-client-3.1.4-1.x86_64.rpm

2.2-安装

三台机器都安装 orchestrator的服务端和客户端

# 安装依赖
yum -y install jq

# 安装
[root@node1 ~]# rpm -ivh orchestrator-3.1.4-1.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:orchestrator-1:3.1.4-1           ################################# [100%]

[root@node1 ~]# rpm -ivh orchestrator-client-3.1.4-1.x86_64.rpm 
Preparing...                          ################################# [100%]
Updating / installing...
   1:orchestrator-client-1:3.1.4-1    ################################# [100%]

2.3-创建orchestrator管理用户

# mysql 主库上操作
mysql> CREATE USER 'orchestrator'@'192.168.20.%' IDENTIFIED BY '123456';
mysql> GRANT SUPER, PROCESS, REPLICATION SLAVE, RELOAD ON *.* TO 'orchestrator'@'192.168.20.%';
mysql> GRANT SELECT ON mysql.slave_master_info TO 'orchestrator'@'192.168.20.%';

2.4-设置配置文件

cp /usr/local/orchestrator/orchestrator-sample-sqlite.conf.json /etc/orchestrator.conf.json

vi /etc/orchestrator.conf.json 修改如下几个参数：

MySQLTopologyUser/MySQLTopologyPassword
检测MySQL集群的用户名/密码
SQLite3DataFile
SQLite库存放路径，需要有写的权限
DefaultInstancePort
mysql 实例端口
FailureDetectionPeriodBlockMinutes
在该时间内再次出现故障，不会被多次发现。
coveryPeriodBlockSeconds
在该时间内再次出现故障，不会进行迁移，避免出现并发恢复和不稳定。
RecoverMasterClusterFilters
只对匹配这些正则表达式模式的集群进行主恢复（“*”模式匹配所有）。
RecoverIntermediateMasterClusterFilters
只对匹配这些正则表达式模式的集群进行主恢复（“*”模式匹配所有）。
PostFailoverProcesses
修改为脚本实际的存放路径
添加Raft相关
Raft相关用于做orchestrator高可用
RaftBind
修改为本机IP

{
  "Debug": true,
  "EnableSyslog": false,
  "ListenAddress": ":3000",
  "MySQLTopologyUser": "orchestrator",
  "MySQLTopologyPassword": "123456",
  "MySQLTopologyCredentialsConfigFile": "",
  "MySQLTopologySSLPrivateKeyFile": "",
  "MySQLTopologySSLCertFile": "",
  "MySQLTopologySSLCAFile": "",
  "MySQLTopologySSLSkipVerify": true,
  "MySQLTopologyUseMutualTLS": false,
  "BackendDB": "sqlite",
  "SQLite3DataFile": "/usr/local/orchestrator/orchestrator.db",
  "MySQLConnectTimeoutSeconds": 1,
  "DefaultInstancePort": 3306,
  "DiscoverByShowSlaveHosts": true,
  "InstancePollSeconds": 5,
  "DiscoveryIgnoreReplicaHostnameFilters": [
    "a_host_i_want_to_ignore[.]example[.]com",
    ".*[.]ignore_all_hosts_from_this_domain[.]example[.]com",
    "a_host_with_extra_port_i_want_to_ignore[.]example[.]com:3307"
  ],
  "UnseenInstanceForgetHours": 240,
  "SnapshotTopologiesIntervalHours": 0,
  "InstanceBulkOperationsWaitTimeoutSeconds": 10,
  "HostnameResolveMethod": "default",
  "MySQLHostnameResolveMethod": "@@hostname",
  "SkipBinlogServerUnresolveCheck": true,
  "ExpiryHostnameResolvesMinutes": 60,
  "RejectHostnameResolvePattern": "",
  "ReasonableReplicationLagSeconds": 10,
  "ProblemIgnoreHostnameFilters": [],
  "VerifyReplicationFilters": false,
  "ReasonableMaintenanceReplicationLagSeconds": 20,
  "CandidateInstanceExpireMinutes": 60,
  "AuditLogFile": "",
  "AuditToSyslog": false,
  "RemoveTextFromHostnameDisplay": ".mydomain.com:3306",
  "ReadOnly": false,
  "AuthenticationMethod": "",
  "HTTPAuthUser": "",
  "HTTPAuthPassword": "",
  "AuthUserHeader": "",
  "PowerAuthUsers": [
    "*"
  ],
  "ClusterNameToAlias": {
    "127.0.0.1": "test suite"
  },
  "ReplicationLagQuery": "",
  "DetectClusterAliasQuery": "SELECT SUBSTRING_INDEX(@@hostname, '.', 1)",
  "DetectClusterDomainQuery": "",
  "DetectInstanceAliasQuery": "",
  "DetectPromotionRuleQuery": "",
  "DataCenterPattern": "[.]([^.]+)[.][^.]+[.]mydomain[.]com",
  "PhysicalEnvironmentPattern": "[.]([^.]+[.][^.]+)[.]mydomain[.]com",
  "PromotionIgnoreHostnameFilters": [],
  "DetectSemiSyncEnforcedQuery": "",
  "ServeAgentsHttp": false,
  "AgentsServerPort": ":3001",
  "AgentsUseSSL": false,
  "AgentsUseMutualTLS": false,
  "AgentSSLSkipVerify": false,
  "AgentSSLPrivateKeyFile": "",
  "AgentSSLCertFile": "",
  "AgentSSLCAFile": "",
  "AgentSSLValidOUs": [],
  "UseSSL": false,
  "UseMutualTLS": false,
  "SSLSkipVerify": false,
  "SSLPrivateKeyFile": "",
  "SSLCertFile": "",
  "SSLCAFile": "",
  "SSLValidOUs": [],
  "URLPrefix": "",
  "StatusEndpoint": "/api/status",
  "StatusSimpleHealth": true,
  "StatusOUVerify": false,
  "AgentPollMinutes": 60,
  "UnseenAgentForgetHours": 6,
  "StaleSeedFailMinutes": 60,
  "SeedAcceptableBytesDiff": 8192,
  "PseudoGTIDPattern": "",
  "PseudoGTIDPatternIsFixedSubstring": false,
  "PseudoGTIDMonotonicHint": "asc:",
  "DetectPseudoGTIDQuery": "",
  "BinlogEventsChunkSize": 10000,
  "SkipBinlogEventsContaining": [],
  "ReduceReplicationAnalysisCount": true,
  "FailureDetectionPeriodBlockMinutes": 5,
  "RecoveryPeriodBlockSeconds": 30,
  "RecoveryIgnoreHostnameFilters": [],
  "RecoverMasterClusterFilters": [
    "*"
  ],
  "RecoverIntermediateMasterClusterFilters": [
    "*"
  ],
  "OnFailureDetectionProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Detected {failureType} on {failureCluster}. Affected replicas: {countSlaves}' >> /tmp/recovery.log"
  ],
  "PreGracefulTakeoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Planned takeover about to take place on {failureCluster}. Master will switch to read_only' >> /tmp/recovery.log"
  ],
  "PreFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Will recover from {failureType} on {failureCluster}' >> /tmp/recovery.log"
  ],
  "PostFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` (for all types) Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}; failureClusterAlias:{failureClusterAlias}' >> /tmp/recovery.log",
    "/usr/local/orchestrator/orch_hook.sh {failureType} {failureClusterAlias} {failedHost} {successorHost} >> /tmp/orch.log"
  ],
  "PostUnsuccessfulFailoverProcesses": [ "echo '`date +'%Y-%m-%d %T'` Unsuccessful Failover ' >> /tmp/recovery.log"],
  "PostMasterFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Promoted: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostIntermediateMasterFailoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Recovered from {failureType} on {failureCluster}. Failed: {failedHost}:{failedPort}; Successor: {successorHost}:{successorPort}' >> /tmp/recovery.log"
  ],
  "PostGracefulTakeoverProcesses": [
    "echo '`date +'%Y-%m-%d %T'` Planned takeover complete' >> /tmp/recovery.log"
  ],
  "CoMasterRecoveryMustPromoteOtherCoMaster": true,
  "DetachLostSlavesAfterMasterFailover": true,
  "ApplyMySQLPromotionAfterMasterFailover": true,
  "PreventCrossDataCenterMasterFailover": false,
  "PreventCrossRegionMasterFailover": false,
  "MasterFailoverDetachReplicaMasterHost": false,
  "MasterFailoverLostInstancesDowntimeMinutes": 0,
  "PostponeReplicaRecoveryOnLagMinutes": 0,
  "OSCIgnoreHostnameFilters": [],
  "GraphiteAddr": "",
  "GraphitePath": "",
  "GraphiteConvertHostnameDotsToUnderscores": true,
  "ConsulAddress": "",
  "ConsulAclToken": "",

  "RaftEnabled":true,
  "RaftDataDir":"/usr/local/orchestrator",
  "RaftBind":"192.168.20.101",
  "DefaultRaftPort":10008,
  "RaftNodes":[
    "192.168.20.101",
    "192.168.20.102",
    "192.168.20.103"
  ]
}

2.5-配置文件复制到其他机器上去

scp /etc/orchestrator.conf.json root@node2:/etc/orchestrator.conf.json
scp /etc/orchestrator.conf.json root@node3:/etc/orchestrator.conf.json

修改最后的 "RaftBind":"192.168.20.101", 的地址为本机地址

2.6-VIP切换脚本创建

所有服务器均创建，建议创建一个后复制过去改下
/usr/local/orchestrator/orch_hook.sh
/usr/local/orchestrator/orch_vip.sh

orch_hook.sh

可能需要修改：

注意脚本路径当前为：/usr/local/orchestrator/orch_vip.sh
array=( ens32 "192.168.20.111" root "192.168.20.101")
对应值为：网卡名称 VIP地址 ssh用户名本机IP
MYSQL_PWD 忽略

[root@node1 orchestrator]# cat orch_hook.sh
#!/bin/bash

isitdead=$1
cluster=$2
oldmaster=$3
newmaster=$4
mysqluser="orchestrator"
export MYSQL_PWD="xxxpassxxx"

logfile="/var/log/orch_hook.log"

# list of clusternames
#clusternames=(rep blea lajos)

# clustername=( interface IP user Inter_IP)
#rep=( ens32 "192.168.56.121" root "192.168.56.125")

if [[ $isitdead == "DeadMaster" ]]; then

    array=( ens32 "192.168.20.111" root "192.168.20.101")
    interface=${array[0]}
    IP=${array[1]}
    user=${array[2]}

    if [ ! -z ${IP} ] ; then

        echo $(date)
        echo "Revocering from: $isitdead"
        echo "New master is: $newmaster"
        echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
        /usr/local/orchestrator/orch_vip.sh -d 1 -n $newmaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster
        #mysql -h$newmaster -u$mysqluser < /usr/local/bin/orch_event.sql
    else

        echo "Cluster does not exist!" | tee $logfile

    fi
elif [[ $isitdead == "DeadIntermediateMasterWithSingleSlaveFailingToConnect" ]]; then

    array=( ens32 "192.168.20.111" root "192.168.20.101")
    interface=${array[0]}
    IP=${array[3]}
    user=${array[2]}
    slavehost=`echo $5 | cut -d":" -f1`

    echo $(date)
    echo "Revocering from: $isitdead"
    echo "New intermediate master is: $slavehost"
    echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
    /usr/local/orchestrator/orch_vip.sh -d 1 -n $slavehost -i ${interface} -I ${IP} -u ${user} -o $oldmaster


elif [[ $isitdead == "DeadIntermediateMaster" ]]; then

        array=( ens32 "192.168.20.111" root "192.168.20.101")
        interface=${array[0]}
        IP=${array[3]}
        user=${array[2]}
    slavehost=`echo $5 | sed -E "s/:[0-9]+//g" | sed -E "s/,/ /g"`
    showslave=`mysql -h$newmaster -u$mysqluser -sN -e "SHOW SLAVE HOSTS;" | awk '{print $2}'`
    newintermediatemaster=`echo $slavehost $showslave | tr ' ' '\n' | sort | uniq -d`

    echo $(date)
    echo "Revocering from: $isitdead"
    echo "New intermediate master is: $newintermediatemaster"
    echo "/usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster" | tee $logfile
    /usr/local/orchestrator/orch_vip.sh -d 1 -n $newintermediatemaster -i ${interface} -I ${IP} -u ${user} -o $oldmaster

fi

orch_vip.sh

需要发邮件可以修改emailaddress的地址，并将 sendmail改为 1

[root@node1 orchestrator]# cat orch_vip.sh 
#!/bin/bash

emailaddress="email@example.com"
sendmail=0

function usage {
  cat << EOF
 usage: $0 [-h] [-d master is dead] [-o old master ] [-s ssh options] [-n new master] [-i interface] [-I] [-u SSH user]
 
 OPTIONS:
    -h        Show this message
    -o string Old master hostname or IP address 
    -d int    If master is dead should be 1 otherweise it is 0
    -s string SSH options
    -n string New master hostname or IP address
    -i string Interface exmple eth0:1
    -I string Virtual IP
    -u string SSH user
EOF

}

while getopts ho:d:s:n:i:I:u: flag; do
  case $flag in
    o)
      orig_master="$OPTARG";
      ;;
    d)
      isitdead="${OPTARG}";
      ;;
    s)
      ssh_options="${OPTARG}";
      ;;
    n)
      new_master="$OPTARG";
      ;;
    i)
      interface="$OPTARG";
      ;;
    I)
      vip="$OPTARG";
      ;;
    u)
      ssh_user="$OPTARG";
      ;;
    h)
      usage;
      exit 0;
      ;;
    *)
      usage;
      exit 1;
      ;;
  esac
done


if [ $OPTIND -eq 1 ]; then 
    echo "No options were passed"; 
    usage;
fi

shift $(( OPTIND - 1 ));

# discover commands from our path
ssh=$(which ssh)
arping=$(which arping)
ip2util=$(which ip)

# command for adding our vip
cmd_vip_add="sudo -n $ip2util address add ${vip} dev ${interface}"
# command for deleting our vip
cmd_vip_del="sudo -n $ip2util address del ${vip}/32 dev ${interface}"
# command for discovering if our vip is enabled
cmd_vip_chk="sudo -n $ip2util address show dev ${interface} to ${vip%/*}/32"
# command for sending gratuitous arp to announce ip move
cmd_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "
# command for sending gratuitous arp to announce ip move on current server
cmd_local_arp_fix="sudo -n $arping -c 1 -I ${interface} ${vip%/*}   "

vip_stop() {
    rc=0

    # ensure the vip is removed
    $ssh ${ssh_options} -tt ${ssh_user}@${orig_master} \
    "[ -n \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_del} && sudo ${ip2util} route flush cache || [ -z \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    return $rc
}

vip_start() {
    rc=0

    # ensure the vip is added
    # this command should exit with failure if we are unable to add the vip
    # if the vip already exists always exit 0 (whether or not we added it)
    $ssh ${ssh_options} -tt ${ssh_user}@${new_master} \
     "[ -z \"\$(${cmd_vip_chk})\" ] && ${cmd_vip_add} && ${cmd_arp_fix} || [ -n \"\$(${cmd_vip_chk})\" ]"
    rc=$?
    $cmd_local_arp_fix
    return $rc
}

vip_status() {
    $arping -c 1 -I ${interface} ${vip%/*}   
    if ping -c 1 -W 1 "$vip"; then
        return 0
    else
        return 1
    fi
}

if [[ $isitdead == 0 ]]; then
    echo "Online failover"
    if vip_stop; then 
        if vip_start; then
            echo "$vip is moved to $new_master."
            if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            echo "Can't add $vip on $new_master!" 
            if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    else
        echo $rc
        echo "Can't remove the $vip from orig_master!"
        if [ $sendmail -eq 1 ]; then mail -s "Can't remove the $vip from orig_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
        exit 1
    fi


elif [[ $isitdead == 1 ]]; then
    echo "Master is dead, failover"
    # make sure the vip is not available 
    if vip_status; then 
        if vip_stop; then
            if [ $sendmail -eq 1 ]; then mail -s "$vip is removed from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
        else
            if [ $sendmail -eq 1 ]; then mail -s "Couldn't remove $vip from orig_master." "$emailaddress" < /dev/null &> /dev/null  ; fi
            exit 1
        fi
    fi

    if vip_start; then
          echo "$vip is moved to $new_master."
          if [ $sendmail -eq 1 ]; then mail -s "$vip is moved to $new_master." "$emailaddress" < /dev/null &> /dev/null  ; fi

    else
          echo "Can't add $vip on $new_master!" 
          if [ $sendmail -eq 1 ]; then mail -s "Can't add $vip on $new_master!" "$emailaddress" < /dev/null &> /dev/null  ; fi
          exit 1
    fi
else
    echo "Wrong argument, the master is dead or live?"

fi

2.7-设置VIP

仅在master节点上创建VIP

ip addr add 192.168.20.111 dev ens32

# 删除 ip addr del 192.168.20.111 dev ens32

2.8-启动orchestrator

所有机器启动

cd /usr/local/orchestrator && nohup ./orchestrator --config=/etc/orchestrator.conf.json http &

2.9-设置环境变量

因为配置了Raft，有多个Orchestrator，所以需要ORCHESTRATOR_API的环境变量，orchestrator-client会自动选择leader。
所有机器执行 vi /etc/profile

# 在最后添加
export ORCHESTRATOR_API="node1:3000/api node2:3000/api node3:3000/api"

# 添加完成后source生效
source /etc/profile

2.10-发现拓扑

# 任意服务器上执行，发现leader在哪
[root@node2 ~]# orchestrator-client -c which-api
node1:3000/api

# 在任意服务器上执行
[root@node2 ~]# orchestrator-client -c discover -i 192.168.20.101:3306
node1:3306

image.png

展开

image.png

3-测试

3.1-kill或关闭mysql

结果会自动进行切换，IP自动漂移

3.2-直接poweroff掉master服务器