canal读取binlog,一旦canal异常挂掉或者重启,会遇到很多坑,主要是元数据一致性问题,常见问题如下:
问题一:canal重启后与运行停止前的位点信息不匹配
[destination = 02reserve , address = /10.130.208.242:3306 , EventParser] ERROR c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - dump address /10.130.208.242:3306 has an error, retrying. caused by
java.io.IOException: Received error packet: errno = 1236, sqlstate = HY000 errmsg = Client requested master to start replication from position > file size; the first event 'mysql-bin.000091' at 797647452, the last event read from '/data/mysql/mysql-bin.000091' at 4, the last byte read from '/data/mysql/mysql-bin.000091' at 4.
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch(DirectLogFetcher.java:102) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:235) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3.run(AbstractEventParser.java:265) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
原因:meta.dat 中保存的位点信息和数据库的位点信息不一致;导致canal抓取不到数据库的动作;
解决方案1:删除meta.dat删除,再重启canal,问题解决;
解决方案2:如果是使用的tsdb模式,则需要删除和instance.properties同一个目录下的h2.mv.db文件,再重启canal,问题解决;
解决方案3:如果canal是集群,则需要进入canal对应的zookeeper集群下,删除节点/otter/canal/destinations/xxxxx/1001/cursor ;重启canal即可恢复;
问题二:
java.lang.OutOfMemoryError: Java heap space
canal消费端挂了太久,在zk对应conf下节点的
/otter/canal/destinations/test_db/1001/cursor 位点信息是很早以前,导致重启canal时,从很早以前的位点开始消费,导致canal服务器内存爆掉
监听数据库变更,只有TransactionBegin/TransactionEnd,没有拿到数据的EventType;
原因可能是canal.instance.filter.black.regex=.\..导致,改canal.instance.filter.black.regex=再重启试试;
问题三:
ERROR com.alibaba.otter.canal.common.alarm.LogAlarmHandler - destination:fdyb_db[com.alibaba.otter.canal.parse.exception.CanalParseException: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: parse row data failed.
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`mysql`.`pds_4490277`
Caused by: com.google.common.collect.ComputationException: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`mysql`.`pds_4490277`
Caused by: com.alibaba.otter.canal.parse.exception.CanalParseException: fetch failed by table meta:`mysql`.`pds_4490277`
Caused by: java.io.IOException: ErrorPacket [errorNumber=1142, fieldCount=-1, message=SELECT command denied to user 'cy_canal'@'11.217.0.224' for table 'pds_4490277', sqlState=42000, sqlStateMarker=#]
with command: desc `mysql`.`pds_4490277`
分析:mysql系统表权限较高,canal读该表的binlog失败,位点无法移动
解决:将配置项中黑名单加上mysql下的所有表:canal.instance.filter.black.regex = mysql\..* ,修改后canal集群不需要重启即可恢复;
其它注意点:检查下CanalConnector是否调用subscribe(filter)方法;有的话,filter需要和instance.properties的canal.instance.filter.regex一致,否则subscribe的filter会覆盖instance的配置,如果subscribe的filter是.\..,那么相当于你消费了所有的更新数据。
问题四:
现象:数据库修改(新增或删除字段等),canal应用感知不到binlog,数据无法正常消费处理;
定位:1.查看canal服务器,canal应用,zk服务器的日志,确认无异常;2.查看mysql,es服务器,无异常,3.查看canal服务器,canal应用配置项,发现canal服务器的canal.properties有问题;
原因:canal.properties中配置了canal.ip和canal.zkServers,如果是zk集群模式下的canal配置了canal.ip,则会优先按IP连接canal服务器,从而让zk功能失效,位点文件则会保存到本地;一旦本地位点文件出现问题,各方无错误日志,问题就很难排查;
解决方案:将canal.ip配置项置为空,关掉canal服务器,关掉canal应用,删除zk上的节点,重启canal服务器,重启canal应用,问题解决;
问题五:
错误日志
ERROR c.a.o.c.p.inbound.mysql.rds.RdsBinlogEventParserProxy - dump address ip:3306 has an error, retrying. caused by
java.net.SocketTimeoutException: Timeout occurred, failed to read total 4 bytes in 25000 milliseconds, actual read only 0 bytes
at com.alibaba.otter.canal.parse.driver.mysql.socket.BioSocketChannel.read(BioSocketChannel.java:124) ~[canal.parse.driver-1.1.5-SNAPSHOT.jar:na]
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch0(DirectLogFetcher.java:174) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at com.alibaba.otter.canal.parse.inbound.mysql.dbsync.DirectLogFetcher.fetch(DirectLogFetcher.java:77) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at com.alibaba.otter.canal.parse.inbound.mysql.MysqlConnection.dump(MysqlConnection.java:235) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at com.alibaba.otter.canal.parse.inbound.AbstractEventParser$3.run(AbstractEventParser.java:265) ~[canal.parse-1.1.5-SNAPSHOT.jar:na]
at java.lang.Thread.run(Thread.java:745) [na:1.8.0_111]
解决方案:先排查是否是canal所在服务没有访问mysql服务器的权限,如果有,接着各软件排查版本。建议MySQL版本为5.6以上,达到与canal兼容,避免有异常信息,5.1.x的话可能会有一些报错异常,测试过5.1.7的会报异常,本次测试安装5.7.29,暂无发现bug。
如果不是双A同步可以使用单个node的单向数据同步,如果是双A同步,则使用两个node,两个channel作数据同步。