测试mysql与greenplum实时同步

一.测试服务器环境准备

1.集群服务器ip及主机名

192.168.10.225 artemis.hadoop.com
192.168.10.226 uranus.hadoop.com
192.168.10.227 ares.hadoop.com
192.168.10.228 bird.hadoop.com
192.168.10.229 lover.hadoop.com

修改命令

vim /etc/hosts

2.zookeeper环境

节点服务器

server.1=uranus.hadoop.com:2888:3888
server.2=ares.hadoop.com:2888:3888
server.3=bird.hadoop.com:2888:3888

分别对应226 227 228
安装位置

/usr/local/zookeeper-3.4.6/bin/
配置文件位置

/usr/local/zookeeper-3.4.6/bin/../conf/zoo.cfg
安装源文件位置

/app/zookeeper-3.4.5.tar.gz
下载地址https://archive.apache.org/dist/zookeeper/

环境配置vim /etc/profile

#zookeeper
ZOOKEEPER_HOME=/usr/local/zookeeper-3.4.6/
export PATH=$ZOOKEEPER_HOME/bin:$PATH

启动、关闭、查看命令

    zkServer.sh start
    ZooKeeper JMX enabled by default
    Using config: /ryyf/apache-zookeeper-3.5.9-bin/bin/../conf/zoo.cfg
    Starting zookeeper ... STARTED

     zkServer.sh stop
    ZooKeeper JMX enabled by default
    Using config: /ryyf/apache-zookeeper-3.5.9-bin/bin/../conf/zoo.cfg
    Stopping zookeeper ... STOPPED

    zkServer.sh status
    ZooKeeper JMX enabled by default
    Using config: /ryyf/apache-zookeeper-3.5.9-bin/bin/../conf/zoo.cfg
    Client port found: 2181. Client address: localhost. Client SSL: false.
    Mode: follower

3.java环境

多版本jdk安装并把jdk11设置为非默认版本
java 路径

/app/jdk1.8.0_144/bin/java

/app/jdk1.8.0_201/bin/java

/app/jdk-11.0.12/bin/java

alternatives --install /usr/bin/java java /app/jdk-11.0.12/bin/java 300

alternatives --install /usr/bin/java java /app/jdk1.8.0_201/bin/java 1

alternatives --install /usr/bin/java java /app/jdk1.8.0_144/bin/java 100

alternatives --config java

There are 3 programs which provide 'java'.

  Selection    Command
-----------------------------------------------
 + 1           /app/jdk1.8.0_201/bin/java
*  2           /app/jdk-11.0.12/bin/java
   3           /app/jdk1.8.0_144/bin/java

java版本

java version "1.8.0_144"
Java(TM) SE Runtime Environment (build 1.8.0_144-b01)
Java HotSpot(TM) 64-Bit Server VM (build 25.144-b01, mixed mode)

安装源文件位置

/app/jdk-8u144-linux-x64.tar.gz

profile位置及修改

vim /etc/profile
JAVA_HOME=/app/jdk1.8.0_201
JRE_HOME=$JAVA_HOME/jre
PATH=$PATH:$JAVA_HOME/bin:$JRE_HOME/bin:
CLASSPATH=.:$JRE_HOME:/lib/dt.jar:$JAVA_HOME/lib/tools.jar:$JRE_HOME/lib:
export PATH JAVA_HOME JRE_HOME CLASSPATH

4.kafka环境

kafka 版本 kafka_2.12-2.1.0
kafka位置：225 /app/kafka_2.12-2.1.0.jar
kafka集群主机：225 226 228
kafka配置文件：/usr/local/kafka/config/server.properties
kafka下载地址：http://archive.apache.org/dist/kafka/
kafka下载地址： wget http://archive.apache.org/dist/kafka/2.1.0/kafka_2.12-2.1.0.tgz
启动关闭命令：/usr/local/kafka/bin/kafka-server-start.sh /usr/local/kafka/config/server.properties &
/usr/local/kafka/bin/kafka-server-stop.sh stop

5.maxwell环境

github项目地址：https://github.com/zendesk/maxwell
网站地址：http://maxwells-daemon.io/
下载地址：https://github.com/zendesk/maxwell/releases/download/v1.35.5/maxwell-1.35.5.tar.gz

curl -sLo - https://github.com/zendesk/maxwell/releases/download/v1.35.5/maxwell-1.35.5.tar.gz \
       | tar zxvf -
cd maxwell-1.35.5

文件地址： /app/maxwell-1.35.5 需要jdk11
帮助文件地址：http://maxwells-daemon.io/quickstart/#google-cloud-pubsub

6.canel

项目首页地址：https://github.com/alibaba/canal
下载地址：

https://github.com/alibaba/canal/releases

https://github.com/alibaba/canal/releases/download/canal-1.1.5/canal.deployer-1.1.5.tar.gz
网上教程https://blog.csdn.net/yehongzhi1994/article/details/107880162
版本：1.15
路径地址：/app/canal.deployer-1.1.5.tar.gz

 mkdir /root/canal
 tar zxvf canal.deployer-1.0.23.tar.gz  -C /app/canal

二.各自配置文件及详细说明

1.jdk11多版本共存安装

2.zookeeper

3.kafka vim /usr/local/kafka/config/server.properties

# Licensed to the Apache Software Foundation (ASF) under one or more
# contributor license agreements.  See the NOTICE file distributed with
# this work for additional information regarding copyright ownership.
# The ASF licenses this file to You under the Apache License, Version 2.0
# (the "License"); you may not use this file except in compliance with
# the License.  You may obtain a copy of the License at
#
#    http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.

# see kafka.server.KafkaConfig for additional details and defaults

############################# Server Basics #############################

# The id of the broker. This must be set to a unique integer for each broker.
broker.id=1
# host.name=uranus.hadoop.com
############################# Socket Server Settings #############################

# The address the socket server listens on. It will get the value returned from
# java.net.InetAddress.getCanonicalHostName() if not configured.
#   FORMAT:
#     listeners = listener_name://host_name:port
#   EXAMPLE:
#     listeners = PLAINTEXT://your.host.name:9092
# listeners=PLAINTEXT://172.16.2.226:9092

# Hostname and port the broker will advertise to producers and consumers. If not set,
# it uses the value for "listeners" if configured.  Otherwise, it will use the value
# returned from java.net.InetAddress.getCanonicalHostName().
# advertised.listeners=PLAINTEXT://172.16.2.226:9092
advertised.host.name=uranus.hadoop.com
advertised.port=9092

zookeeper.connect=uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181
# Maps listener names to security protocols, the default is for them to be the same. See the config documentation for more details
#listener.security.protocol.map=PLAINTEXT:PLAINTEXT,SSL:SSL,SASL_PLAINTEXT:SASL_PLAINTEXT,SASL_SSL:SASL_SSL

# The number of threads that the server uses for receiving requests from the network and sending responses to the network
num.network.threads=3

# The number of threads that the server uses for processing requests, which may include disk I/O
num.io.threads=8

# The send buffer (SO_SNDBUF) used by the socket server
socket.send.buffer.bytes=102400

# The receive buffer (SO_RCVBUF) used by the socket server
socket.receive.buffer.bytes=102400

# The maximum size of a request that the socket server will accept (protection against OOM)
socket.request.max.bytes=104857600


############################# Log Basics #############################

# A comma separated list of directories under which to store log files
log.dirs=/tmp/kafka-logs

# The default number of log partitions per topic. More partitions allow greater
# parallelism for consumption, but this will also result in more files across
# the brokers.
num.partitions=40

# The number of threads per data directory to be used for log recovery at startup and flushing at shutdown.
# This value is recommended to be increased for installations with data dirs located in RAID array.
num.recovery.threads.per.data.dir=1

############################# Internal Topic Settings  #############################
# The replication factor for the group metadata internal topics "__consumer_offsets" and "__transaction_state"
# For anything other than development testing, a value greater than 1 is recommended for to ensure availability such as 3.
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1

############################# Log Flush Policy #############################

# Messages are immediately written to the filesystem but by default we only fsync() to sync
# the OS cache lazily. The following configurations control the flush of data to disk.
# There are a few important trade-offs here:
#    1. Durability: Unflushed data may be lost if you are not using replication.
#    2. Latency: Very large flush intervals may lead to latency spikes when the flush does occur as there will be a lot of data to flush.
#    3. Throughput: The flush is generally the most expensive operation, and a small flush interval may lead to excessive seeks.
# The settings below allow one to configure the flush policy to flush data after a period of time or
# every N messages (or both). This can be done globally and overridden on a per-topic basis.

# The number of messages to accept before forcing a flush of data to disk
#log.flush.interval.messages=10000

# The maximum amount of time a message can sit in a log before we force a flush
#log.flush.interval.ms=1000

############################# Log Retention Policy #############################

# The following configurations control the disposal of log segments. The policy can
# be set to delete segments after a period of time, or after a given size has accumulated.
# A segment will be deleted whenever *either* of these criteria are met. Deletion always happens
# from the end of the log.

# The minimum age of a log file to be eligible for deletion due to age
log.retention.hours=24

# A size-based retention policy for logs. Segments are pruned from the log unless the remaining
# segments drop below log.retention.bytes. Functions independently of log.retention.hours.
#log.retention.bytes=1073741824

# The maximum size of a log segment file. When this size is reached a new log segment will be created.
log.segment.bytes=1073741824

# The interval at which log segments are checked to see if they can be deleted according
# to the retention policies
log.retention.check.interval.ms=300000

############################# Zookeeper #############################

# Zookeeper connection string (see zookeeper docs for details).
# This is a comma separated host:port pairs, each corresponding to a zk
# server. e.g. "127.0.0.1:3000,127.0.0.1:3001,127.0.0.1:3002".
# You can also append an optional chroot string to the urls to specify the
# root directory for all kafka znodes.

# Timeout in ms for connecting to zookeeper
zookeeper.connection.timeout.ms=6000


############################# Group Coordinator Settings #############################

# The following configuration specifies the time, in milliseconds, that the GroupCoordinator will delay the initial consumer rebalance.
# The rebalance will be further delayed by the value of group.initial.rebalance.delay.ms as new members join the group, up to a maximum of max.poll.interval.ms.
# The default value for this is 3 seconds.
# We override this to 0 here as it makes for a better out-of-the-box experience for development and testing.
# However, in production environments the default value of 3 seconds is more suitable as this will help to avoid unnecessary, and potentially expensive, rebalances during application startup.
group.initial.rebalance.delay.ms=0

参数名称	参数值	备注
broker.id	1	broker.id的值三个节点要配置不同的值，分别配置为0，1，2
advertised.host.name	uranus.hadoop.com	在hosts文件配置kafka1域名，另外两台分别为：kafka2.sd.cn，kafka3.sd.cn
advertised.port	9092	默认端口，不需要改
log.dirs	/tmp/kafka-logs
num.partitions	40	分区数，自行修改
log.retention.hours	24	日志保存时间
zookeeper.connect	uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181	zookeeper连接地址，多个以逗号隔开
listeners	PLAINTEXT://172.16.2.226:9092	就是主要用来定义Kafka Broker的Listener的配置项。本案不填写
advertised.listeners	PLAINTEXT://172.16.2.226:9092	参数的作用就是将Broker的Listener信息发布到Zookeeper中，如果不配置采用listeners，本案不填写
hostname		本案不填写

4.canal配置过程

创建用户

CREATE USER 'canal'@'localhost' IDENTIFIED BY 'canal';
GRANT ALL PRIVILEGES ON *.* TO 'canal'@'localhost' WITH GRANT OPTION;
CREATE USER 'canal'@'%' IDENTIFIED BY 'canal';
GRANT ALL PRIVILEGES ON *.* TO 'canal'@'%' WITH GRANT OPTION;
flush privileges;

修改配置文件：（如果是访问本机，并且用户密码都为canal则不需要修改配置文件)

vi /app/canal/conf/example/instance.properties
#################################################
## mysql serverId , v1.0.26+ will autoGen
canal.instance.mysql.slaveId=225

# enable gtid use true/false
canal.instance.gtidon=true

# position info
canal.instance.master.address=192.168.10.216:3306
canal.instance.master.journal.name=
canal.instance.master.position=
canal.instance.master.timestamp=
canal.instance.master.gtid=

# rds oss binlog
canal.instance.rds.accesskey=
canal.instance.rds.secretkey=
canal.instance.rds.instanceId=

# table meta tsdb info
canal.instance.tsdb.enable=true
#canal.instance.tsdb.url=jdbc:mysql://127.0.0.1:3306/canal_tsdb
#canal.instance.tsdb.dbUsername=canal
#canal.instance.tsdb.dbPassword=canal

#canal.instance.standby.address =
#canal.instance.standby.journal.name =
#canal.instance.standby.position =
#canal.instance.standby.timestamp =
#canal.instance.standby.gtid=

# username/password
canal.instance.dbUsername=canal
canal.instance.dbPassword=canal
canal.instance.connectionCharset = UTF-8
# enable druid Decrypt database password
canal.instance.enableDruid=false
#canal.instance.pwdPublicKey=MFwwDQYJKoZIhvcNAQEBBQADSwAwSAJBALK4BUxdDltRRE5/zXpVEVPUgunvscYFtEip3pmLlhrWpacX7y7GCMo2/JM6LeHmiiNdH1FWgGCpUfircSwlWKUCAwEAAQ==

# table regex
canal.instance.filter.regex=.*\\..*
# table black regex
canal.instance.filter.black.regex=mysql\\.slave_.*
# table field filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.field=test1.t_product:id/subject/keywords,test2.t_company:id/name/contact/ch
# table field black filter(format: schema1.tableName1:field1/field2,schema2.tableName2:field1/field2)
#canal.instance.filter.black.field=test1.t_product:subject/product_image,test2.t_company:id/name/contact/ch

# mq config
canal.mq.topic=mysql_binlog
# dynamic topic route by schema or table regex
#canal.mq.dynamicTopic=mytest1.user,mytest2\\..*,.*\\..*
canal.mq.partition=0
# hash partition config
#canal.mq.partitionsNum=3
#canal.mq.partitionHash=test.table:id^name,.*\\..*
#canal.mq.dynamicTopicPartitionNum=test.*:4,mycanal:6
#################################################

修改canal 配置文件

vi /app/canal/conf/canal.properties 
#################################################
#########    common argument        #############
#################################################
# tcp bind ip
canal.ip =
# register ip to zookeeper
canal.register.ip =
canal.port = 11111
canal.metrics.pull.port = 11112
# canal instance user/passwd
# canal.user = canal
# canal.passwd = E3619321C1A937C46A0D8BD1DAC39F93B27D4458

# canal admin config
#canal.admin.manager = 127.0.0.1:8089
canal.admin.port = 11110
canal.admin.user = admin
canal.admin.passwd = 4ACFE3202A5FF5CF467898FC58AAB1D615029441
# admin auto register
#canal.admin.register.auto = true
#canal.admin.register.cluster =
#canal.admin.register.name =

canal.zkServers =
# flush data to zk
canal.zookeeper.flush.period = 1000
canal.withoutNetty = false
# tcp, kafka, rocketMQ, rabbitMQ
canal.serverMode = kafka
# flush meta cursor/parse position to file
canal.file.data.dir = ${canal.conf.dir}
canal.file.flush.period = 1000
## memory store RingBuffer size, should be Math.pow(2,n)
canal.instance.memory.buffer.size = 16384
## memory store RingBuffer used memory unit size , default 1kb
canal.instance.memory.buffer.memunit = 1024
## meory store gets mode used MEMSIZE or ITEMSIZE
canal.instance.memory.batch.mode = MEMSIZE
canal.instance.memory.rawEntry = true

## detecing config
canal.instance.detecting.enable = false
#canal.instance.detecting.sql = insert into retl.xdual values(1,now()) on duplicate key update x=now()
canal.instance.detecting.sql = select 1
canal.instance.detecting.interval.time = 3
canal.instance.detecting.retry.threshold = 3
canal.instance.detecting.heartbeatHaEnable = false

# support maximum transaction size, more than the size of the transaction will be cut into multiple transactions delivery
canal.instance.transaction.size =  1024
# mysql fallback connected to new master should fallback times
canal.instance.fallbackIntervalInSeconds = 60

# network config
canal.instance.network.receiveBufferSize = 16384
canal.instance.network.sendBufferSize = 16384
canal.instance.network.soTimeout = 30

# binlog filter config
canal.instance.filter.druid.ddl = true
canal.instance.filter.query.dcl = false
canal.instance.filter.query.dml = false
canal.instance.filter.query.ddl = false
canal.instance.filter.table.error = false
canal.instance.filter.rows = false
canal.instance.filter.transaction.entry = false
canal.instance.filter.dml.insert = false
canal.instance.filter.dml.update = false
canal.instance.filter.dml.delete = false

# binlog format/image check
canal.instance.binlog.format = ROW,STATEMENT,MIXED
canal.instance.binlog.image = FULL,MINIMAL,NOBLOB

# binlog ddl isolation
canal.instance.get.ddl.isolation = false

# parallel parser config
canal.instance.parser.parallel = true
## concurrent thread number, default 60% available processors, suggest not to exceed Runtime.getRuntime().availableProcessors()
#canal.instance.parser.parallelThreadSize = 16
## disruptor ringbuffer size, must be power of 2
canal.instance.parser.parallelBufferSize = 256

# table meta tsdb info
canal.instance.tsdb.enable = true
canal.instance.tsdb.dir = ${canal.file.data.dir:../conf}/${canal.instance.destination:}
canal.instance.tsdb.url = jdbc:h2:${canal.instance.tsdb.dir}/h2;CACHE_SIZE=1000;MODE=MYSQL;
canal.instance.tsdb.dbUsername = canal
canal.instance.tsdb.dbPassword = canal
# dump snapshot interval, default 24 hour
canal.instance.tsdb.snapshot.interval = 24
# purge snapshot expire , default 360 hour(15 days)
canal.instance.tsdb.snapshot.expire = 360

#################################################
#########               destinations            #############
#################################################
canal.destinations = example
# conf root dir
canal.conf.dir = ../conf
# auto scan instance dir add/remove and start/stop instance
canal.auto.scan = true
canal.auto.scan.interval = 5
# set this value to 'true' means that when binlog pos not found, skip to latest.
# WARN: pls keep 'false' in production env, or if you know what you want.
canal.auto.reset.latest.pos.mode = false

canal.instance.tsdb.spring.xml = classpath:spring/tsdb/h2-tsdb.xml
#canal.instance.tsdb.spring.xml = classpath:spring/tsdb/mysql-tsdb.xml

canal.instance.global.mode = spring
canal.instance.global.lazy = false
canal.instance.global.manager.address = ${canal.admin.manager}
#canal.instance.global.spring.xml = classpath:spring/memory-instance.xml
canal.instance.global.spring.xml = classpath:spring/file-instance.xml
#canal.instance.global.spring.xml = classpath:spring/default-instance.xml

##################################################
#########             MQ Properties      #############
##################################################
# aliyun ak/sk , support rds/mq
canal.aliyun.accessKey =
canal.aliyun.secretKey =
canal.aliyun.uid=

canal.mq.flatMessage = true
canal.mq.canalBatchSize = 50
canal.mq.canalGetTimeout = 100
# Set this value to "cloud", if you want open message trace feature in aliyun.
canal.mq.accessChannel = local

canal.mq.database.hash = true
canal.mq.send.thread.size = 30
canal.mq.build.thread.size = 8

##################################################
#########                    Kafka                   #############
##################################################
kafka.bootstrap.servers = 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
kafka.acks = all
kafka.compression.type = none
kafka.batch.size = 16384
kafka.linger.ms = 1
kafka.max.request.size = 1048576
kafka.buffer.memory = 33554432
kafka.max.in.flight.requests.per.connection = 1
kafka.retries = 0

kafka.kerberos.enable = false
kafka.kerberos.krb5.file = "../conf/kerberos/krb5.conf"
kafka.kerberos.jaas.file = "../conf/kerberos/jaas.conf"

##################################################
#########                   RocketMQ         #############
##################################################
rocketmq.producer.group = test
rocketmq.enable.message.trace = false
rocketmq.customized.trace.topic =
rocketmq.namespace =
rocketmq.namesrv.addr = 127.0.0.1:9876
rocketmq.retry.times.when.send.failed = 0
rocketmq.vip.channel.enabled = false
rocketmq.tag =

##################################################
#########                   RabbitMQ         #############
##################################################
rabbitmq.host =
rabbitmq.virtual.host =
rabbitmq.exchange =
rabbitmq.username =
rabbitmq.password =
rabbitmq.deliveryMode =

三.各自环境测试

1.zookeeper测试

2.kafka环境测试

1、创建topic：test

/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 --replication-factor 1 --partitions 1 --topic test

2、列出已创建的topic列表

/usr/local/kafka/bin/kafka-topics.sh --list --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 


3、模拟客户端去发送消息 在其中一台测试！

/usr/local/kafka/bin/kafka-console-producer.sh --broker-list uranus.hadoop.com:9092,artemis.hadoop.com:9092,bird.hadoop.com:9092 --topic test

4、模拟客户端去接受消息 新版本的接收语句不一样不是 --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181 ！！！

/usr/local/kafka/bin/kafka-console-consumer.sh  --bootstrap-server uranus.hadoop.com:9092,artemis.hadoop.com:9092,bird.hadoop.com:9092 --topic test --from-beginning

删除topic
/usr/local/kafka/bin/kafka-topics.sh  --delete --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181   --topic mysql_binlog

     （1）登录zookeeper客户端：命令：/usr/local/zookeeper-3.4.6/bin/zookeeper-client

     （2）找到topic所在的目录：ls /brokers/topics

     （3）找到要删除的topic，执行命令：rmr /brokers/topics/【topic name】即可，此时topic被彻底删除。
     
 删除kafka存储目录（server.properties文件log.dirs配置，默认为"/tmp/kafka-logs"）相关topic目录

3.maxwell 测试这玩意需要jdk11！！！

1.创建topic：mysql_binlog
/usr/local/kafka/bin/kafka-topics.sh --create --zookeeper uranus.hadoop.com:2181,ares.hadoop.com:2181,bird.hadoop.com:2181  -replication-factor 3 --partitions 5 --topic mysql_binlog

2.在后台启动mysql和maxwell
/app/maxwell-1.35.5/bin/maxwell --user='root' --password='1234@Abcd' --port=3306 --host='192.168.10.216' --producer=kafka \
--kafka.bootstrap.servers=uranus.hadoop.com:9092,ares.hadoop.com:9092,bird.hadoop.com:9092 --kafka_topic=mysql_binlog &

/app/maxwell-1.35.5/bin/maxwell --config config.properties

3.监控kafka中的数据
/usr/local/kafka/bin/kafka-console-consumer.sh --bootstrap-server uranus.hadoop.com:9092,artemis.hadoop.com:9092,bird.hadoop.com:9092 --topic mysql_binlog --from-beginning

[2022-01-11 13:40:12,523] WARN [Consumer clientId=consumer-1, groupId=console-consumer-81729] Connection to node -2 (ares.hadoop.com/192.168.10.227:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
{"database":"test","table":"home","type":"insert","ts":1641870563,"xid":76127446,"commit":true,"data":{"id":31223,"tab":"123","values":"131231"}}
{"database":"test","table":"home","type":"insert","ts":1641871080,"xid":76127964,"commit":true,"data":{"id":666,"tab":"666","values":"66666"}}
{"database":"test","table":"home","type":"update","ts":1641871200,"xid":76128473,"commit":true,"data":{"id":66644,"tab":"666","values":"66666"},"old":{"id":666}}
{"database":"test","table":"home","type":"update","ts":1641871967,"xid":76129546,"commit":true,"data":{"id":4234234,"tab":"123","values":"123"},"old":{"id":88881}}
{"database":"test","table":"home","type":"update","ts":1641872049,"xid":76129898,"commit":true,"data":{"id":6666667,"tab":"5555","values":"555555"},"old":{"id":5555}}
{"database":"test","table":"home","type":"update","ts":1641872403,"xid":76130973,"commit":true,"data":{"id":243324,"tab":"123","values":"123"},"old":{"id":4234234}}

4.gpkafka配置及测试-maxwell版

DATABASE: test
USER: gpadmin
HOST: 192.168.10.227
PORT: 5432
VERSION: 2
KAFKA:
   INPUT:
      SOURCE:
        BROKERS: 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
        TOPIC: mysql_binlog
      VALUE:
        COLUMNS:
          - NAME: c1
            TYPE: json
        FORMAT: json
      ERROR_LIMIT: 100
   OUTPUT:
      SCHEMA: test
      TABLE: home
      MAPPING:
        - NAME: id
          EXPRESSION: (c1->'data'->>'id')::decimal
        - NAME: tab
          EXPRESSION: (c1->'data'->>'tab')::text
        - NAME: values
          EXPRESSION: (c1->'data'->>'values')::text
   COMMIT:
      MINIMAL_INTERVAL: 2000

注意json格式!

kafka中的json数据格式：

插入

{"database":"test","table":"home","type":"insert","ts":1641885332,"xid":76159057,"commit":true,"data":{"id":5543223,"tab":"23","values":"2234"}}

修改

{"database":"test","table":"home","type":"update","ts":1641871200,"xid":76128473,"commit":true,"data":{"id":66644,"tab":"666","values":"66666"},"old":{"id":666}}

示例https://segmentfault.com/a/1190000022567264 未成功调试

5.canal测试

启动canal

sh /app/canal/bin/startup.sh

cd to /root/canal/bin for workaround relative path
LOG CONFIGURATION : /root/canal/bin/../conf/logback.xml
canal conf : /root/canal/bin/../conf/canal.properties
CLASSPATH :/root/canal/bin/../conf:/root/canal/bin/../lib/zookeeper-3.4.5.jar:/root/canal/bin/../lib/zkclient-0.1.jar:/root/canal/bin/../lib/spring-2.5.6.jar:/root/canal/bin/../lib/slf4j-api-1.7.12.jar:/root/canal/bin/../lib/protobuf-java-2.6.1.jar:/root/canal/bin/../lib/oro-2.0.8.jar:/root/canal/bin/../lib/netty-3.2.5.Final.jar:/root/canal/bin/../lib/logback-core-1.1.3.jar:/root/canal/bin/../lib/logback-classic-1.1.3.jar:/root/canal/bin/../lib/log4j-1.2.14.jar:/root/canal/bin/../lib/jcl-over-slf4j-1.7.12.jar:/root/canal/bin/../lib/guava-18.0.jar:/root/canal/bin/../lib/fastjson-1.1.35.jar:/root/canal/bin/../lib/commons-logging-1.1.1.jar:/root/canal/bin/../lib/commons-lang-2.6.jar:/root/canal/bin/../lib/commons-io-2.4.jar:/root/canal/bin/../lib/commons-beanutils-1.8.2.jar:/root/canal/bin/../lib/canal.store-1.0.23.jar:/root/canal/bin/../lib/canal.sink-1.0.23.jar:/root/canal/bin/../lib/canal.server-1.0.23.jar:/root/canal/bin/../lib/canal.protocol-1.0.23.jar:/root/canal/bin/../lib/canal.parse.driver-1.0.23.jar:/root/canal/bin/../lib/canal.parse.dbsync-1.0.23.jar:/root/canal/bin/../lib/canal.parse-1.0.23.jar:/root/canal/bin/../lib/canal.meta-1.0.23.jar:/root/canal/bin/../lib/canal.instance.spring-1.0.23.jar:/root/canal/bin/../lib/canal.instance.manager-1.0.23.jar:/root/canal/bin/../lib/canal.instance.core-1.0.23.jar:/root/canal/bin/../lib/canal.filter-1.0.23.jar:/root/canal/bin/../lib/canal.deployer-1.0.23.jar:/root/canal/bin/../lib/canal.common-1.0.23.jar:/root/canal/bin/../lib/aviator-2.2.1.jar:.:/usr/java/jdk1.8.0_121/lib
cd to /root/canal for continue

关闭canal

sh /app/canal/bin/stop.sh
master1: stopping canal 16062 ... 
Oook! cost:1

相关日志位置

cat /app/canal/logs/canal/canal.log
cat /app/canal/logs/example/example.log

json样式

{"data":[{"id":"13","tab":"123","values":"1231231"}],"database":"test","es":1642058435000,"id":10,"isDdl":false,"mysqlType":{"id":"int(14)","tab":"varchar(20)","values":"varchar(60)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":4,"tab":12,"values":12},"table":"home","ts":1642058440150,"type":"DELETE"}
{"data":[{"id":"32","tab":"12","values":"213"}],"database":"test","es":1642058451000,"id":11,"isDdl":false,"mysqlType":{"id":"int(14)","tab":"varchar(20)","values":"varchar(60)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":4,"tab":12,"values":12},"table":"home","ts":1642058455906,"type":"INSERT"}

6.gpkafka配置及测试-canal版

暂时只考虑插入的情况！

canal中的插入语句的json格式为：

{"data":[{"id":"444","tab":"444","values":"4444","rrr":null}],"database":"test","es":1642062004000,"id":5,"isDdl":false,"mysqlType":{"id":"int(14)","tab":"varchar(21)","values":"varchar(60)","rrr":"varchar(50)"},"old":null,"pkNames":["id"],"sql":"","sqlType":{"id":4,"tab":12,"values":12,"rrr":12},"table":"home","ts":1642062008970,"type":"INSERT"}

格式不是标准的json需要处理

DATABASE: test
USER: gpadmin
HOST: 192.168.10.227
PORT: 5432
VERSION: 2
KAFKA:
   INPUT:
      SOURCE:
        BROKERS: 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
        TOPIC: test_home
      VALUE:
        COLUMNS:
          - NAME: c1
            TYPE: json
        FORMAT: json
      ERROR_LIMIT: 100
   OUTPUT:
      SCHEMA: test
      TABLE: home
      MAPPING:
        - NAME: id
          EXPRESSION: ((c1#>>'{data,0}')::json->>'id')::decimal
        - NAME: tab
          EXPRESSION: ((c1#>>'{data,0}')::json->>'tab')::text
        - NAME: values
          EXPRESSION: ((c1#>>'{data,0}')::json->>'values')::text
   COMMIT:
      MINIMAL_INTERVAL: 2000

gpkafka load test_home.yaml

成功达成！

DATABASE: test
USER: gpadmin
HOST: 192.168.10.227
PORT: 5432
VERSION: 2
KAFKA:
   INPUT:
      SOURCE:
        BROKERS: 192.168.10.225:9092,192.168.10.226:9092,192.168.10.228:9092
        TOPIC: test_home
      VALUE:
        COLUMNS:
          - NAME: c1
            TYPE: json
        FORMAT: json
      ERROR_LIMIT: 100
   OUTPUT:
      SCHEMA: test
      MODE: MERGE
      MATCH_COLUMNS:
        - id
      UPDATE_COLUMNS:
        - tab
        - values
      ORDER_COLUMNS:
        - ts
        - xid
        - del_mark
        - ddl_type
      TABLE: home
      MAPPING:
        - NAME: id
          EXPRESSION: ((c1#>>'{data,0}')::json->>'id')::decimal
        - NAME: tab
          EXPRESSION: ((c1#>>'{data,0}')::json->>'tab')::text
        - NAME: values
          EXPRESSION: ((c1#>>'{data,0}')::json->>'values')::text
        - NAME: ts
          EXPRESSION: (c1->>'es')::decimal
        - NAME: xid
          EXPRESSION: (c1->>'id')::decimal
        - NAME: ddl_type
          EXPRESSION: (c1->>'type')::text
        - NAME: del_mark
          EXPRESSION: CASE WHEN ((c1->>'type')::text= 'DELETE') then true else false end
   COMMIT:
      MINIMAL_INTERVAL: 2000

https://segmentfault.com/a/1190000022567264 完成，是DELETE_CONDITION: (c1->>'type')::text = 'DELETE' 条件写错误

完美达成，可以实现一个虚拟删除记录

测试mysql与greenplum实时同步

测试mysql与greenplum实时同步

一.测试服务器环境准备

1.集群服务器ip及主机名

2.zookeeper环境

3.java环境

4.kafka环境

5.maxwell环境

6.canel

二.各自配置文件及详细说明

1.jdk11多版本共存安装

2.zookeeper

3.kafka vim /usr/local/kafka/config/server.properties

4.canal配置过程

三.各自环境测试

1.zookeeper测试

2.kafka环境测试

3.maxwell 测试 这玩意需要jdk11！！！

4.gpkafka配置及测试-maxwell版

5.canal测试

6.gpkafka配置及测试-canal版

推荐阅读更多精彩内容

3.maxwell 测试这玩意需要jdk11！！！