Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[pulsar-broker] Stop to dispatch when skip message temporally since K…
…ey_Shared consumer stuck on delivery (#7553) ### Motivation In some case of Key_Shared consumer, messages ordering was broken. Here is how to reproduce(I think it is one of case to reproduce this issue). 1. Connect Consumer1 to Key_Shared subscription `sub` and stop to receive - receiverQueueSize: 500 2. Connect Producer and publish 500 messages with key `(i % 10)` 3. Connect Consumer2 to same subscription and start to receive - receiverQueueSize: 1 - since #7106 , Consumer2 can't receive (expected) 4. Producer publish more 500 messages with same key generation algorithm 5. After that, Consumer1 start to receive 6. Check Consumer2 message ordering - sometimes message ordering was broken in same key Consumer1: ``` Connected: Tue Jul 14 09:36:39 JST 2020 [pulsar-client-io-1-1] WARN com.scurrilous.circe.checksum.Crc32cIntChecksum - Failed to load Circe JNI library. Falling back to Java based CRC32c provider [pulsar-timer-4-1] INFO org.apache.pulsar.client.impl.ConsumerStatsRecorderImpl - [persistent://public/default/key-shared-test] [sub0] [820f0] Prefetched messages: 499 --- Consume throughput received: 0.02 msgs/s --- 0.00 Mbit/s --- Ack sent rate: 0.00 ack/s --- Failed messages: 0 --- batch messages: 0 ---Failed acks: 0 Received: my-message-0 PublishTime: 1594687006203 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-1 PublishTime: 1594687006243 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-2 PublishTime: 1594687006247 Date: Tue Jul 14 09:37:46 JST 2020 ... Received: my-message-498 PublishTime: 1594687008727 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-499 PublishTime: 1594687008731 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-500 PublishTime: 1594687038742 Date: Tue Jul 14 09:37:46 JST 2020 ... Received: my-message-990 PublishTime: 1594687040094 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-994 PublishTime: 1594687040103 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-995 PublishTime: 1594687040105 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-997 PublishTime: 1594687040113 Date: Tue Jul 14 09:37:46 JST 2020 ``` Consumer2: ``` Connected: Tue Jul 14 09:37:03 JST 2020 [pulsar-client-io-1-1] WARN com.scurrilous.circe.checksum.Crc32cIntChecksum - Failed to load Circe JNI library. Falling back to Java based CRC32c provider Received: my-message-501 MessageId: 4:1501:-1 PublishTime: 1594687038753 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-502 MessageId: 4:1502:-1 PublishTime: 1594687038755 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-503 MessageId: 4:1503:-1 PublishTime: 1594687038759 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-506 MessageId: 4:1506:-1 PublishTime: 1594687038785 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-508 MessageId: 4:1508:-1 PublishTime: 1594687038812 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-901 MessageId: 4:1901:-1 PublishTime: 1594687039871 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-509 MessageId: 4:1509:-1 PublishTime: 1594687038815 Date: Tue Jul 14 09:37:46 JST 2020 ordering was broken, key: 1 oldNum: 901 newNum: 511 Received: my-message-511 MessageId: 4:1511:-1 PublishTime: 1594687038826 Date: Tue Jul 14 09:37:46 JST 2020 Received: my-message-512 MessageId: 4:1512:-1 PublishTime: 1594687038830 Date: Tue Jul 14 09:37:46 JST 2020 ... ``` I think this issue is caused by #7105. Here is an example. 1. dispatch messages 2. Consumer2 was stuck and `totalMessagesSent=0` - Consumer2 availablePermits was 0 3. skip redeliver messages temporally - Consumer2 availablePermits was back to 1 4. dispatch new messages - new message was dispatched to Consumer2 5. back to redeliver messages 4. dispatch messages - ordering was broken ### Modifications Stop to dispatch when skip message temporally since Key_Shared consumer stuck on delivery.
- Loading branch information