[技术分享] Kafka Pause采坑记录
Kafka在使用中,我们可能会需要有主动暂停消费的业务需求,等待一个信号再主动恢复,但实际使用中,客户端提供的
暂停功能可能和你想象的不一样,当使用不当时,可能会引起pause机制失效,自动恢复消费,一定要特别注意!
[采坑条件]
- 使用kafka 客户端提供的pause功能
- kafka消费者出现rebalance情况
[采坑复现]
1.创建一个test的topic,分区数为1
2.启动进程A,订阅test,显式调用KafkaConsumer pause
3.启动进程B,订阅test,触发kafka 消费者rebalance情况
4.观察进程A,继续消费test topic的消息,pause机制失效
[Kafka 客户端实现pause的机制原理]
1.puase功能是在kafka 客户端实现的,服务端并没有维护分区是否暂停消费的状态
2.kafka 客户端本地维护一个消费者订阅分区的状态列表,key为分区,value为分区消费者状态,属性pause为true或fause
3.当新的kafka 消费者加入消费组或者退出消费组,都会引起kafka 消费组 rebalance
4.当rebalance之后,kafka客户端会重新分配消费的分区,原先的状态位丢失,导致pause失效
[源码分析]
客户端通过TopicPartitionState维护每个分区状态,包含pause属性,显式调用后为true
private static class TopicPartitionState {
private Long position; // last consumed position
private Long highWatermark; // the high watermark from last fetch
private Long lastStableOffset;
private OffsetAndMetadata committed; // last committed position
private boolean paused; // whether this partition has been paused by the user
private OffsetResetStrategy resetStrategy; // the strategy to use if the offset needs resetting
kafka 消费者通过AbstractCoordinator类,作用是维护心跳和分区等相关信息
这里主要分析onJoinComplete方法,此方法会在消费者加入group成功后调用
/**
* Invoked when a group member has successfully joined a group.
* @param generation The generation that was joined
* @param memberId The identifier for the local member in the group
* @param protocol The protocol selected by the coordinator
* @param memberAssignment The assignment propagated from the group leader
*/
protected abstract void onJoinComplete(int generation,
String memberId,
String protocol,
ByteBuffer memberAssignment);
具体的实现类为ConsumerCoordinator,实现细节中,主要关注方法subscriptions.assignFromSubscribed(assignment.partitions());
@Override
protected void onJoinComplete(int generation,
String memberId,
String assignmentStrategy,
ByteBuffer assignmentBuffer) {
// only the leader is responsible for monitoring for metadata changes (i.e. partition changes)
if (!isLeader)
assignmentSnapshot = null;
PartitionAssignor assignor = lookupAssignor(assignmentStrategy);
if (assignor == null)
throw new IllegalStateException("Coordinator selected invalid assignment protocol: " + assignmentStrategy);
Assignment assignment = ConsumerProtocol.deserializeAssignment(assignmentBuffer);
// set the flag to refresh last committed offsets
subscriptions.needRefreshCommits();
// update partition assignment 此处是重点
subscriptions.assignFromSubscribed(assignment.partitions());
.......
assignFromSubscribed是最终维护分区的方法
public void assignFromSubscribed(Collection<TopicPartition> assignments) {
if (!this.partitionsAutoAssigned())
throw new IllegalArgumentException("Attempt to dynamically assign partitions while manual assignment in use");
Map<TopicPartition, TopicPartitionState> assignedPartitionStates = partitionToStateMap(assignments);
fireOnAssignment(assignedPartitionStates.keySet());
......
可以看到最终的TopicPartitionState在这重新生成了一次,导致原先保存的各个分区是否暂停消费的状态丢失
private static Map<TopicPartition, TopicPartitionState> partitionToStateMap(Collection<TopicPartition> assignments) {
Map<TopicPartition, TopicPartitionState> map = new HashMap<>(assignments.size());
for (TopicPartition tp : assignments)
map.put(tp, new TopicPartitionState());
return map;
}