Flink的容错机制将代码从错误中恢复过来并继续执行,这些错误包括机器硬件故障、网络故障、程序故障等。
当数据源参与了快照机制后,Flink就能保证用户自定义状态的唯一性状态修改了,下表列出了Flink与捆绑的连接器的状态更新保证。
请阅读每个connector的文档来了解容错性保证的详细信息。
Source | Guarantees | Notes |
---|---|---|
Apache Kafka | exactly once | 为你的版本使用适当的连接器 |
AWS Kinesis Streams | exactly once | |
RabbitMQ | at most once (v 0.10) / exactly once (v 1.0) | |
Twitter Streaming AP | at most once | |
Collections | exactly once | |
Files | exactly once | |
Sockets | at most once |
为了保证端到端的精确记录分发(除了精确的状态语义之外),数据sink需要引入checkpoint机制。下面表格列出了Flink结合捆绑的sink的分发保证(假设状态更新是精确的)
Source | Guarantees | Notes |
---|---|---|
HDFS rolling sink | exactly once | 实现依赖于Hadoop的版本 |
Elasticsearch | at least once | |
Kafka producer | at least once | |
Cassandra sink | at least once / exactly once | exactly once 仅针对幂等更新 |
AWS Kinesis Streams | at least once | |
File sinks | at least once | |
Socket sinks | at least once | |
Standard output | at least once | |
Redis sink | at least once |