主从半同步降级异步分析

一、问题

这个问题大概是这样。当前主从状态如下有2个半同步从库,如果停掉1个半同步从库是是否会堵塞主库,堵塞时间大概是多久?

mysql> show global status like '%semi%';
+--------------------------------------------+-----------+
| Variable_name                              | Value     |
+--------------------------------------------+-----------+
| Rpl_semi_sync_master_clients               | 2         |
| Rpl_semi_sync_master_net_avg_wait_time     | 0         |
| Rpl_semi_sync_master_net_wait_time         | 0         |
| Rpl_semi_sync_master_net_waits             | 8454      |
| Rpl_semi_sync_master_no_times              | 9         |
| Rpl_semi_sync_master_no_tx                 | 1597      |
| Rpl_semi_sync_master_status                | ON        |
| Rpl_semi_sync_master_timefunc_failures     | 0         |
| Rpl_semi_sync_master_tx_avg_wait_time      | 77630     |
| Rpl_semi_sync_master_tx_wait_time          | 169312495 |
| Rpl_semi_sync_master_tx_waits              | 2181      |
| Rpl_semi_sync_master_wait_pos_backtraverse | 0         |
| Rpl_semi_sync_master_wait_sessions         | 0         |
| Rpl_semi_sync_master_yes_tx                | 2181      |
| Rpl_semi_sync_slave_status                 | OFF       |
+--------------------------------------------+-----------+
15 rows in set (0.01 sec)

mysql> show variables like '%semi%';
+-------------------------------------------+---------------------+
| Variable_name                             | Value               |
+-------------------------------------------+---------------------+
| rpl_semi_sync_master_enabled              | ON                  |
| rpl_semi_sync_master_timeout              | 1000000000000000000 |
| rpl_semi_sync_master_trace_level          | 32                  |
| rpl_semi_sync_master_wait_for_slave_count | 2                   |
| rpl_semi_sync_master_wait_no_slave        | OFF                 |
| rpl_semi_sync_master_wait_point           | AFTER_SYNC          |
| rpl_semi_sync_slave_enabled               | ON                  |
| rpl_semi_sync_slave_trace_level           | 32                  |
+-------------------------------------------+---------------------+

二、分析

首先从设置中这里,

  • rpl_semi_sync_master_timeout:1000000000000000000 设置为几乎无限大
  • rpl_semi_sync_master_wait_for_slave_count:2 设置为2代表需要2个半同步落盘才返回ACK。
  • rpl_semi_sync_master_wait_no_slave:OFF 设置为了OFF(默认为ON)

而从当前的状态来看,

  • Rpl_semi_sync_master_clients 存在2个半同步从库
  • Rpl_semi_sync_master_status 为ON,处于半同步状态。

这种情况下虽然rpl_semi_sync_master_timeout设置了很大值,但是如果2个半同步中一个关闭后,是否需要等待rpl_semi_sync_master_timeout设置的值呢?其实这就和rpl_semi_sync_master_wait_no_slave 参数有关,因为设置为off,那么当半同步的dump线程退出的时候那么会判断当前半同步的数量是否满足rpl_semi_sync_master_wait_for_slave_count 参数设置,如果不满足则马上降级为异步复制(回调函数repl_semi_binlog_dump_end) 。

但是dump线程的退出并不是及时的,需要等到一次心跳超时再次发送心跳的时候发现无法发送才会停止dump线程,而线上繁忙的系统,通常DUMP线程不会处于等待心跳发送的情况下的,但是一但回话线程在等待ACK上堵塞,那么binlog就不会再更改,dump线程也就没有事可做,只能触发心跳发送,但是发送心跳前会等待一个心跳周期为slave_net_timeout/2,如下,

Binlog_sender::wait_with_heartbeat
do
  {
    set_timespec_nsec(&ts, m_heartbeat_period); //心跳周期
    ret= mysql_bin_log.wait_for_update_bin_log(m_thd, &ts);//如果没有binlog生成,也会不会被提前唤醒,等待到超时
    if (ret != ETIMEDOUT && ret != ETIME) //如果是正常收到 发送信号则返回 否则如果是超时则发送心跳event,进行心跳检测
      break; //正常返回0 是超时返回ETIMEDOUT 继续循环

...
      if (send_heartbeat_event(log_pos)) //发送心跳
        return 1;
  } while (!m_thd->killed);

slave_net_timeout默认为60秒,因此心跳周期默认为是30秒的时间,也就是参数slave_net_timeout/2,也就是这种情况下,半同步降级可能来自2个方面,

  • 通过repl_semi_report_binlog_sync 回调函数,用户线程ACK超时直接降级,这依赖rpl_semi_sync_master_timeout参数
  • 心跳发送超时未接收到,导致dump线程退出,退出触发回调函数repl_semi_binlog_dump_end,判断当前剩余半桶数量是否小于rpl_semi_sync_master_wait_for_slave_count 参数设置降级。

因此此处降级的时间大概应该是min(slave_net_timeout/2,rpl_semi_sync_master_timeout(微秒)),取一个小值。

当然如果rpl_semi_sync_master_wait_no_slave保持默认值ON,则必须等待rpl_semi_sync_master_timeout参数设置的大小才降级。

三、结论和测试截图

因此这里我们看到这个配置下(rpl_semi_sync_master_wait_no_slave=OFF,rpl_semi_sync_master_wait_for_slave_count =2),停掉一个从库实际上会影响到主库但是在slave_net_timeout/2超时后会降级,默认是30秒,也就是30秒降级,当然如果rpl_semi_sync_master_timeout是默认的10秒,那么优先使用rpl_semi_sync_master_timeout的时间作为降级标准。这种情况下,5.7 测试截图如下,


image.png

8.0测试如下,


image.png

四、超时栈

(gdb) bt
#0  0x00007ffff7bcade2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x00000000018a9cb7 in native_cond_timedwait (cond=0x2dc1100 <mysql_bin_log+1824>, mutex=0x2dc1048 <mysql_bin_log+1640>, abstime=0x7fffec165e10)
    at /opt/percona-server-locks-detail-5.7.22/include/thr_cond.h:129
#2  0x00000000018aa01c in safe_cond_timedwait (cond=0x2dc1100 <mysql_bin_log+1824>, mp=0x2dc1020 <mysql_bin_log+1600>, abstime=0x7fffec165e10, 
    file=0x2221bd8 "/opt/percona-server-locks-detail-5.7.22/sql/binlog.cc", line=8362) at /opt/percona-server-locks-detail-5.7.22/mysys/thr_cond.c:88
#3  0x0000000001802a65 in my_cond_timedwait (cond=0x2dc1100 <mysql_bin_log+1824>, mp=0x2dc1020 <mysql_bin_log+1600>, abstime=0x7fffec165e10, 
    file=0x2221bd8 "/opt/percona-server-locks-detail-5.7.22/sql/binlog.cc", line=8362) at /opt/percona-server-locks-detail-5.7.22/include/thr_cond.h:180
#4  0x0000000001802f61 in inline_mysql_cond_timedwait (that=0x2dc1100 <mysql_bin_log+1824>, mutex=0x2dc1020 <mysql_bin_log+1600>, abstime=0x7fffec165e10, 
    src_file=0x2221bd8 "/opt/percona-server-locks-detail-5.7.22/sql/binlog.cc", src_line=8362) at /opt/percona-server-locks-detail-5.7.22/include/mysql/psi/mysql_thread.h:1229
#5  0x0000000001814919 in MYSQL_BIN_LOG::wait_for_update_bin_log (this=0x2dc09e0 <mysql_bin_log>, thd=0x7ffee40008c0, timeout=0x7fffec165e10)
    at /opt/percona-server-locks-detail-5.7.22/sql/binlog.cc:8362
#6  0x0000000001839283 in Binlog_sender::wait_with_heartbeat (this=0x7fffec1665b0, log_pos=194) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:652
#7  0x00000000018391d2 in Binlog_sender::wait_new_events (this=0x7fffec1665b0, log_pos=194) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:631
#8  0x0000000001838e87 in Binlog_sender::get_binlog_end_pos (this=0x7fffec1665b0, log_cache=0x7fffec165fc0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:397
#9  0x00000000018369c5 in Binlog_sender::send_binlog (this=0x7fffec1665b0, log_cache=0x7fffec165fc0, start_pos=123) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:345
#10 0x0000000001836383 in Binlog_sender::run (this=0x7fffec1665b0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:226
#11 0x0000000001834074 in mysql_binlog_send (thd=0x7ffee40008c0, log_ident=0x7fffec1671e0 "", pos=4, slave_gtid_executed=0x7fffec166fb0, flags=0)
    at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:414
#12 0x0000000001833f40 in com_binlog_dump_gtid (thd=0x7ffee40008c0, packet=0x7ffee40063f1 "", packet_length=124) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:398
#13 0x00000000015671d6 in dispatch_command (thd=0x7ffee40008c0, com_data=0x7fffec167d90, command=COM_BINLOG_DUMP_GTID) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1705
#14 0x00000000015655c5 in do_command (thd=0x7ffee40008c0) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1021
#15 0x00000000016a635c in handle_connection (arg=0x64b1d50) at /opt/percona-server-locks-detail-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:312
#16 0x00000000018ce0f6 in pfs_spawn_thread (arg=0x5d3bc20) at /opt/percona-server-locks-detail-5.7.22/storage/perfschema/pfs.cc:2190
#17 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007ffff66008dd in clone () from /lib64/libc.so.6
(gdb) c
Continuing.
2024-10-21T08:24:34.739778Z 0 [Note] InnoDB: page_cleaner: 1000ms intended loop took 8700ms. The settings might not be optimal. (flushed=0, during the time.)

Breakpoint 3, repl_semi_binlog_dump_end (param=0x7fffec165ee0) at /opt/percona-server-locks-detail-5.7.22/plugin/semisync/semisync_master_plugin.cc:164
164       bool semi_sync_slave= is_semi_sync_dump();
(gdb) bt
#0  repl_semi_binlog_dump_end (param=0x7fffec165ee0) at /opt/percona-server-locks-detail-5.7.22/plugin/semisync/semisync_master_plugin.cc:164
#1  0x000000000148c6f6 in Binlog_transmit_delegate::transmit_stop (this=0x2d89fc0 <delegates_init()::transmit_mem>, thd=0x7ffee40008c0, flags=0)
    at /opt/percona-server-locks-detail-5.7.22/sql/rpl_handler.cc:729
#2  0x0000000001836169 in Binlog_sender::cleanup (this=0x7fffec1665b0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:176
#3  0x00000000018368b9 in Binlog_sender::run (this=0x7fffec1665b0) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_binlog_sender.cc:312
#4  0x0000000001834074 in mysql_binlog_send (thd=0x7ffee40008c0, log_ident=0x7fffec1671e0 "", pos=4, slave_gtid_executed=0x7fffec166fb0, flags=0)
    at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:414
#5  0x0000000001833f40 in com_binlog_dump_gtid (thd=0x7ffee40008c0, packet=0x7ffee40063f1 "", packet_length=124) at /opt/percona-server-locks-detail-5.7.22/sql/rpl_master.cc:398
#6  0x00000000015671d6 in dispatch_command (thd=0x7ffee40008c0, com_data=0x7fffec167d90, command=COM_BINLOG_DUMP_GTID) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1705
#7  0x00000000015655c5 in do_command (thd=0x7ffee40008c0) at /opt/percona-server-locks-detail-5.7.22/sql/sql_parse.cc:1021
#8  0x00000000016a635c in handle_connection (arg=0x64b1d50) at /opt/percona-server-locks-detail-5.7.22/sql/conn_handler/connection_handler_per_thread.cc:312
#9  0x00000000018ce0f6 in pfs_spawn_thread (arg=0x5d3bc20) at /opt/percona-server-locks-detail-5.7.22/storage/perfschema/pfs.cc:2190
#10 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#11 0x00007ffff66008dd in clone () from /lib64/libc.so.6
(gdb) thread 73
[Switching to thread 73 (Thread 0x7fffd8161700 (LWP 1887))]
#0  0x00007ffff7bcade2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007ffff7bcade2 in pthread_cond_timedwait@@GLIBC_2.3.2 () from /lib64/libpthread.so.0
#1  0x0000000004802623 in native_cond_timedwait (cond=0x7e0efa8 <mysql_bin_log+1096>, mutex=0xa2d21d8, abstime=0x7fffd815e280) at /newdata/mysql-8.0.23/include/thr_cond.h:100
#2  0x000000000480298e in safe_cond_timedwait (cond=0x7e0efa8 <mysql_bin_log+1096>, mp=0xa2d21b0, abstime=0x7fffd815e280, file=0x61b1de8 "/newdata/mysql-8.0.23/sql/binlog.cc", line=7557)
    at /newdata/mysql-8.0.23/mysys/thr_cond.cc:113
#3  0x00000000043b8259 in my_cond_timedwait (cond=0x7e0efa8 <mysql_bin_log+1096>, mp=0x7e0ef48 <mysql_bin_log+1000>, abstime=0x7fffd815e280, file=0x61b1de8 "/newdata/mysql-8.0.23/sql/binlog.cc", 
    line=7557) at /newdata/mysql-8.0.23/include/thr_cond.h:147
#4  0x00000000043b8496 in inline_mysql_cond_timedwait (that=0x7e0efa8 <mysql_bin_log+1096>, mutex=0x7e0ef48 <mysql_bin_log+1000>, abstime=0x7fffd815e280, 
    src_file=0x61b1de8 "/newdata/mysql-8.0.23/sql/binlog.cc", src_line=7557) at /newdata/mysql-8.0.23/include/mysql/psi/mysql_cond.h:241
#5  0x00000000043cb6a6 in MYSQL_BIN_LOG::wait_for_update (this=0x7e0eb60 <mysql_bin_log>, timeout=0x7fffd815e280) at /newdata/mysql-8.0.23/sql/binlog.cc:7557
#6  0x000000000448acaf in Binlog_sender::wait_with_heartbeat (this=0x7fffd815eb60, log_pos=316133) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:796
#7  0x0000000004489023 in Binlog_sender::wait_new_events (this=0x7fffd815eb60, log_pos=316133) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:778
#8  0x0000000004488502 in Binlog_sender::get_binlog_end_pos (this=0x7fffd815eb60, reader=0x7fffd815e970, end_pos=0x7fffd815e440) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:557
#9  0x0000000004488201 in Binlog_sender::send_binlog (this=0x7fffd815eb60, reader=0x7fffd815e970, start_pos=125) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:511
#10 0x0000000004487b95 in Binlog_sender::run (this=0x7fffd815eb60) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:412
#11 0x0000000004483604 in mysql_binlog_send (thd=0x7fff20015540, log_ident=0x7fffd815f700 "", pos=4, slave_gtid_executed=0x7fffd815f540, flags=2) at /newdata/mysql-8.0.23/sql/rpl_master.cc:1017
#12 0x00000000044834c9 in com_binlog_dump_gtid (thd=0x7fff20015540, packet=0x7fff20006f71 "", packet_length=70) at /newdata/mysql-8.0.23/sql/rpl_master.cc:1002
#13 0x000000000323dcc9 in dispatch_command (thd=0x7fff20015540, com_data=0x7fffd8160b40, command=COM_BINLOG_DUMP_GTID) at /newdata/mysql-8.0.23/sql/sql_parse.cc:2048
#14 0x000000000323b513 in do_command (thd=0x7fff20015540) at /newdata/mysql-8.0.23/sql/sql_parse.cc:1320
#15 0x000000000340dfb1 in handle_connection (arg=0xa2d4880) at /newdata/mysql-8.0.23/sql/conn_handler/connection_handler_per_thread.cc:301
#16 0x000000000500dc16 in pfs_spawn_thread (arg=0xa524270) at /newdata/mysql-8.0.23/storage/perfschema/pfs.cc:2900
#17 0x00007ffff7bc6ea5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007ffff61418dd in clone () from /lib64/libc.so.6
(gdb) f 5
#5  0x00000000043cb6a6 in MYSQL_BIN_LOG::wait_for_update (this=0x7e0eb60 <mysql_bin_log>, timeout=0x7fffd815e280) at /newdata/mysql-8.0.23/sql/binlog.cc:7557
7557        ret = mysql_cond_timedwait(&update_cond, &LOCK_binlog_end_pos,
(gdb) p timeout
$3 = (const timespec *) 0x7fffd815e280
(gdb) p *timeout
$4 = {tv_sec = 1729517575, tv_nsec = 413257724}
(gdb) f 6
#6  0x000000000448acaf in Binlog_sender::wait_with_heartbeat (this=0x7fffd815eb60, log_pos=316133) at /newdata/mysql-8.0.23/sql/rpl_binlog_sender.cc:796
796         ret = mysql_bin_log.wait_for_update(&ts);
(gdb)  p ts
$5 = {tv_sec = 1729517575, tv_nsec = 413257724}
(gdb) p m_heartbeat_period.count()
$6 = 30000001024
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 219,539评论 6 508
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 93,594评论 3 396
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 165,871评论 0 356
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,963评论 1 295
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,984评论 6 393
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,763评论 1 307
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,468评论 3 420
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,357评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,850评论 1 317
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 38,002评论 3 338
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 40,144评论 1 351
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,823评论 5 346
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,483评论 3 331
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 32,026评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 33,150评论 1 272
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 48,415评论 3 373
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 45,092评论 2 355