-
Notifications
You must be signed in to change notification settings - Fork 137
[BUG] Kafka producer client can not connect to kop (Removing node xxxx:9092 (id: 2057312963 rack: null) from least loaded node selection since it is neither ready for sending or connecting)[BUG] #618
Comments
|
KOP Version :2.7.2.4 + patch :https://github.com/streamnative/kop/pull/586/files |
There're two deadlock cases:
But the second deadlock is caused by the first deadlock, for example kop/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaTopicConsumerManager.java Lines 132 to 142 in fa65715
Once kop/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaTopicConsumerManager.java Line 244 in fa65715
For the first deadlock, there're two cases that
kop/kafka-impl/src/main/java/io/streamnative/pulsar/handlers/kop/KafkaTopicManager.java Lines 91 to 103 in fa65715
Let's use thread N to represent Threads that stuck at
Threads that stuck at acquiring read-write lock of
There's one deadlock fix in apache/pulsar#9787. Since KoP 2.7.x.y depends on Pulsar 2.7.x, KoP 2.7.x.y doesn't contain the fix. For branch-2.7, we can use For deadlock of Further more, we should avoid creating a |
Fixes #618 ### Motivation See #618 (comment) for the deadlock analysis. ### Modifications - Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved. - Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking. - Run a single cursor expire task instead one task per channel, since #404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.
Fixes streamnative#618 ### Motivation See streamnative#618 (comment) for the deadlock analysis. ### Modifications - Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved. - Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking. - Run a single cursor expire task instead one task per channel, since streamnative#404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.
Fixes #618 ### Motivation See #618 (comment) for the deadlock analysis. ### Modifications - Use `ConcurrentHashMap` instead of `ConcurrentLongHashMap`. Though this bug may already be fixed in apache/pulsar#9787, the `ConcurrentHashMap` from Java standard library is more reliable. The possible performance enhancement brought by `ConcurrentLongHashMap` still needs to be proved. - Use `AtomicBoolean` as `KafkaTopicConsumerManager`'s state instead of read-write lock to avoid `close()` method that tries to acquire write lock blocking. - Run a single cursor expire task instead one task per channel, since #404 changed `consumerTopicManagers` to a static field, there's no reason to run a task for each connection.
The text was updated successfully, but these errors were encountered: