GTID下复制问题和解决

news/2024/12/24 9:15:54/

环境介绍

数据库1主2从,mysql版本是v5.19

表结构

一、主库新增记录,从库提示主键冲突

模拟故障

1, master上关闭 sql_log_bin,删除id =103 后打开

2, 确认此时从库有id=103,主库没有

3, master insert id=103 主从异常

             Slave_IO_Running: YesSlave_SQL_Running: NoReplicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1062Last_Error: Could not execute Write_rows event on table test.hero3; Duplicate entry '103' for key 'PRIMARY', Error_code: 1062; handler error HA_ERR_FOUND_DUPP_KEY; the event's master log mysql-bin.000045, end_log_pos 3056

修复故障

根据报错test.hero3; Duplicate entry '103' for key 'PRIMARY,去slave中找到对应记录,删除即可

mysql> delete from test.hero3 where id=103;
Query OK, 1 row affected (0.00 sec)mysql> stop slave;start slave;
Query OK, 0 rows affected (0.00 sec)Query OK, 0 rows affected (0.00 sec)mysql> show slave status\G
*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 192.168.10.129Master_User: replMaster_Port: 3358Connect_Retry: 60Master_Log_File: mysql-bin.000046Read_Master_Log_Pos: 194Relay_Log_File: mha3-relay-bin.000012Relay_Log_Pos: 407Relay_Master_Log_File: mysql-bin.000046Slave_IO_Running: YesSlave_SQL_Running: Yes

二、主库UPDATE,从库找不到记录

模拟故障

1.1, slave上将id=1 的记录delete

1.2, master上执行update xxx where id = 1;

1.3,    show slave status\G查看主从同步异常

# slave:
mysql> delete from hero3 where id = 1;
Query OK, 1 row affected (0.01 sec)# master
mysql> update hero3 set age = 1 where id = 1;
Query OK, 1 row affected (0.03 sec)
Rows matched: 1  Changed: 1  Warnings: 0# slave;
mysql>  show slave status\G
*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 192.168.10.129Master_User: replMaster_Port: 3358Connect_Retry: 60Master_Log_File: mysql-bin.000045Read_Master_Log_Pos: 2819Relay_Log_File: mha2-relay-bin.000009Relay_Log_Pos: 866Relay_Master_Log_File: mysql-bin.000045Slave_IO_Running: YesSlave_SQL_Running: NoReplicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032Last_Error: Could not execute Update_rows event on table test.hero3; Can't find record in 'hero3', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000045, end_log_pos 2788

修复故障

从库中找到对应的记录,在slave上将该记录插入;

1,根据同步异常错误mysql-bin.000045, end_log_pos 2788,去主库对应sql

2, mysqlbinlog --no-defaults -vv --base64-output=decode-rows mysql-bin.000045 | grep -A 20 'end_log_pos 2788' 命令查找执行的sql

[root@mha1 mysql]# mysqlbinlog --no-defaults -vv --base64-output=decode-rows mysql-bin.000045 | grep -A 20 'end_log_pos 2788'
#241222 10:38:37 server id 10129  end_log_pos 2788 CRC32 0x1bfc92fc     Update_rows: table id 224 flags: STMT_END_F
### UPDATE `test`.`hero3`
### WHERE
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='mao1' /* VARSTRING(60) meta=60 nullable=0 is_null=0 */
###   @3=110 /* INT meta=0 nullable=0 is_null=0 */
### SET
###   @1=1 /* INT meta=0 nullable=0 is_null=0 */
###   @2='mao1' /* VARSTRING(60) meta=60 nullable=0 is_null=0 */
###   @3=1 /* INT meta=0 nullable=0 is_null=0 */

3,根据 UPDATE `test`.`hero3` WHERE @1=1 @2='mao1' @3=110 拼接insert并插入slave

mysql>    insert into `test`.`hero3` values (1,'mao1',110);
Query OK, 1 row affected (0.03 sec)mysql> stop slave; start slave;
Query OK, 0 rows affected (0.01 sec)Query OK, 0 rows affected (0.00 sec)mysql> show slave status\G
*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 192.168.10.129Master_User: replMaster_Port: 3358Connect_Retry: 60Master_Log_File: mysql-bin.000045Read_Master_Log_Pos: 2819Relay_Log_File: mha2-relay-bin.000010Relay_Log_Pos: 454Relay_Master_Log_File: mysql-bin.000045Slave_IO_Running: YesSlave_SQL_Running: Yes

三、主库DELETE,从库找不到记录

模拟故障

主库在关闭binlog的情况下insert一条数据,然后打开binlog并删除该数据

mysql> set sql_log_bin = 0; 
Query OK, 0 rows affected (0.00 sec)mysql> insert into hero3(name,age) values('delete_test',22);
Query OK, 1 row affected (0.01 sec)mysql> set sql_log_bin = 1;
Query OK, 0 rows affected (0.00 sec)mysql> select * from hero3 where id >= 102;
+-----+-------------+-----+
| id  | name        | age |
+-----+-------------+-----+
| 102 | mao2        | 111 |
| 103 | mao2        | 109 |
| 104 | delete_test |  22 |
+-----+-------------+-----+
3 rows in set (0.00 sec)mysql> delete from hero3 where id = 104;
Query OK, 1 row affected (0.03 sec)

此时从库报错信息如下

             Slave_IO_Running: YesSlave_SQL_Running: NoReplicate_Do_DB: Replicate_Ignore_DB: Replicate_Do_Table: Replicate_Ignore_Table: Replicate_Wild_Do_Table: Replicate_Wild_Ignore_Table: Last_Errno: 1032Last_Error: Could not execute Delete_rows event on table test.hero3; Can't find record in 'hero3', Error_code: 1032; handler error HA_ERR_KEY_NOT_FOUND; the event's master log mysql-bin.000045, end_log_pos 2092

 修复故障

由于在执行delete的时候,slave找不到该记录,那么我们就可以跳过该事务

Retrieved_Gtid_Set 表示slave已经收到的gtid集合,Executed_Gtid_Set表示slave已经执行的gtid集合,从中可以看到 4855c186-ba02-11ef-8bf4-000c297511da:11没有执行

由于删除的数据在slave中找不到,所以我们就可以跳过该事务; 

mysql> stop slave;
Query OK, 0 rows affected (0.00 sec)mysql> set gtid_next = '4855c186-ba02-11ef-8bf4-000c297511da:11'; #设定下一个要执行的GTID:11
Query OK, 0 rows affected (0.02 sec)mysql> begin;commit; #产生一个空事务
Query OK, 0 rows affected (0.00 sec)Query OK, 0 rows affected (0.00 sec)mysql> set gtid_next='AUTOMATIC'; # 将next设为自动
Query OK, 0 rows affected (0.00 sec)mysql> start slave;
Query OK, 0 rows affected (0.02 sec)mysql> show slave status\G      # 可以看到主从状态已恢复                                                                                                                                             *************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 192.168.10.129Master_User: replMaster_Port: 3358Connect_Retry: 60Master_Log_File: mysql-bin.000045Read_Master_Log_Pos: 2123Relay_Log_File: mha3-relay-bin.000008Relay_Log_Pos: 454Relay_Master_Log_File: mysql-bin.000045Slave_IO_Running: YesSlave_SQL_Running: Yes

四、主库日志被purged,主从同步失败

模拟故障

4.1.1 stop slave断开主从同步

4.1.2 主库上执行一些事务,使GTID持续增长

mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 28 rows affected (0.00 sec)
Records: 28  Duplicates: 0  Warnings: 0mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 56 rows affected (0.00 sec)
Records: 56  Duplicates: 0  Warnings: 0mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 112 rows affected (0.00 sec)
Records: 112  Duplicates: 0  Warnings: 0mysql> flush logs;
Query OK, 0 rows affected (0.00 sec)mysql> insert into hero3(name,age) select name,age from hero3;
Query OK, 224 rows affected (0.00 sec)
Records: 224  Duplicates: 0  Warnings: 0mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000053 |       734 |
| mysql-bin.000054 |       972 |
| mysql-bin.000055 |      1448 |
| mysql-bin.000056 |      2400 |
| mysql-bin.000057 |      4257 |
+------------------+-----------+

4.1.3 清理binlog,造成master上事务缺失

mysql> purge binary logs to 'mysql-bin.000058';
Query OK, 0 rows affected (0.00 sec)mysql> show master logs;
+------------------+-----------+
| Log_name         | File_size |
+------------------+-----------+
| mysql-bin.000058 |       194 |
+------------------+-----------+
1 row in set (0.00 sec)

4.1.4 slave上执行start slave,观察主从异常,提示“but the master has purged binary logs containing GTIDs that the slave requires”

mysql> show slave status\G
*************************** 1. row ***************************Slave_IO_State: Master_Host: 192.168.10.129Master_User: replMaster_Port: 3358Connect_Retry: 60Master_Log_File: mysql-bin.000050Read_Master_Log_Pos: 194Relay_Log_File: mha3-relay-bin.000021Relay_Log_Pos: 407Relay_Master_Log_File: mysql-bin.000050Slave_IO_Running: NoSlave_SQL_Running: Yes
Master_SSL_Verify_Server_Cert: NoLast_IO_Errno: 1236Last_IO_Error: Got fatal error 1236 from master when reading data from binary log: 'The slave is connecting using CHANGE MASTER TO MASTER_AUTO_POSITION = 1, but the master has purged binary logs containing GTIDs that the slave requires.'

修复故障

由于master上这部分binlog已经被清理掉了,所以这部分数据已经无法同步到从库。 及时恢复了主从同步状态,这部分数据也是丢失的; 工作的时候遇到这种情况有两种处理方法

4.2.1        master上根据select @@gtid_purged;可以查看主库GTID执行了1-27

mysql> select @@gtid_purged;
+-------------------------------------------+
| @@gtid_purged                             |
+-------------------------------------------+
| 4855c186-ba02-11ef-8bf4-000c297511da:1-27 |
+-------------------------------------------+

从库show slave status命令可以看到Retrieved_Gtid_Set 和Executed_Gtid_Set 中显示从库已经获取并执行了GTID:1-20

           Retrieved_Gtid_Set: 4855c186-ba02-11ef-8bf4-000c297511da:6-20Executed_Gtid_Set: 090b3bdd-ba02-11ef-9ce1-000c292443d0:1-4,
4855c186-ba02-11ef-8bf4-000c297511da:1-20

证明中间有21-27,一共7个事务丢失;

4.2.2  尝试将gtid_purged设置为1-27

mysql>  set global gtid_purged = '4855c186-ba02-11ef-8bf4-000c297511da:1-27';
ERROR 1840 (HY000): @@GLOBAL.GTID_PURGED can only be set when @@GLOBAL.GTID_EXECUTED is empty.
mysql> 
mysql> reset slve^C
mysql> set @@GLOBAL.GTID_EXECUTED = '';
ERROR 1238 (HY000): Variable 'gtid_executed' is a read only variable

发现无法设置,我们可以reset master,清空从库的binlog和gtid_executed

mysql> reset master;
Query OK, 0 rows affected (0.00 sec)mysql> set global gtid_purged='4855c186-ba02-11ef-8bf4-000c297511da:1-27';
Query OK, 0 rows affected (0.00 sec)mysql> start slave;
Query OK, 0 rows affected (0.00 sec)mysql> show slave status\G
*************************** 1. row ***************************Slave_IO_State: Waiting for master to send eventMaster_Host: 192.168.10.129Master_User: replMaster_Port: 3358Connect_Retry: 60Master_Log_File: mysql-bin.000058Read_Master_Log_Pos: 194Relay_Log_File: mha3-relay-bin.000022Relay_Log_Pos: 407Relay_Master_Log_File: mysql-bin.000058Slave_IO_Running: YesSlave_SQL_Running: Yes

4.2.3 虽然此时同步恢复正常,但是主从是不一致的,后面执行其他事务的时候是正常的,但是当执行到丢失的7个事务中的数据的时候还是会报错

# master
mysql>  select count(1) from hero3;
+----------+
| count(1) |
+----------+
|      448 |
+----------+
#slave
mysql>   select count(1) from hero3;
+----------+
| count(1) |
+----------+
|      208 |
+----------+
1 row in set (0.00 sec)

后面等异常的时候可以按照上三种情况进行修复,发现一例修复一例;

4.2.4  我们不知道上面7个事务究竟影响了多少数据量,如果条件允许,可以从主库上dump一份全量数据,重新搭建一个新的从库,这样可以保证数据完全一致,不用每次都手动修复


http://www.ppmy.cn/news/1557703.html

相关文章

蓝桥杯——竞赛省赛国赛题分享

目录 一.[蓝桥杯 2013 省 AB] 错误票据 代码如下: 二.[蓝桥杯 2024 省 Java B] 报数游戏 代码如下: 讲解: 三.[蓝桥杯 2014 国 C] 拼接平方数 代码如下: 四.三步问题(递归,上台阶) 代码…

推荐一款免费且好用的 国产 NAS 系统 ——FnOS

一、系统基础信息 开发基础:基于最新的Linux内核(Debian发行版)深度开发,兼容主流x86硬件(ARM还没适配),自由组装NAS,灵活扩展外部存储。 使用情况:官方支持功能较多&am…

【华为OD-E卷-寻找密码 100分(python、java、c++、js、c)】

【华为OD-E卷-寻找密码 100分(python、java、c、js、c)】 题目 小王在进行游戏大闯关,有一个关卡需要输入一个密码才能通过,密码获得的条件如下: 在一个密码本中,每一页都有一个由 26 个小写字母组成的若…

ue5 pcg(程序内容生成)真的简单方便,就5个节点

总结: 前情提示 鼠标单击右键平移节点 1.编辑-》插件-》procedural->勾选两个插件 2.右键-》pcg图表-》拖拽进入场景 3.先看点point 右键-》调试(快捷键d)->右侧设置粒子数 3.1调整粒子数 可以在右侧输入框,使用加减乘除 4.1 表面采样器 …

【国产NI替代】32振动/电压(配置复合型)高精度终端采集板卡,应用于复杂的大型测量场景

32振动/电压(配置复合型)高精度终端采集板卡 采用 EP4CE115F29I7 型号的 FPGA ,是一款 高精度,多通道动态信号采集器,主要应用 在复杂的大型测量并对成本要求不敏感的场 合,默认具备 8 个测量板卡&#…

electron-vite【实战】登录/注册页

效果预览 项目搭建 https://blog.csdn.net/weixin_41192489/article/details/144611858 技术要点 路由默认跳转到登录页 src/renderer/src/router/index.ts routes: [// 默认跳转到登录页{path: /,redirect: /login},...routes]登录窗口的必要配置 src/main/index.ts 中 cons…

Spring Boot 中实现自定义注解记录接口日志功能

👨🏻‍💻 热爱摄影的程序员 👨🏻‍🎨 喜欢编码的设计师 🧕🏻 擅长设计的剪辑师 🧑🏻‍🏫 一位高冷无情的全栈工程师 欢迎分享 / 收藏 / 赞 / 在看…

Suno Api V4模型无水印开发「高清音频WAV下载」 —— 「Suno Api系列」第6篇

历史文章 Suno AI API接入 - 将AI音乐接入到自己的产品中,支持120并发任务 Suno Api V4模型无水印开发「灵感模式」 —— 「Suno Api系列」第1篇 Suno Api V4模型无水印开发「自定义模式」 —— 「Suno Api系列」第2篇 Suno Api V4模型无水印开发「AI生成歌词」…