STORM-2394 KafkaSpout: Has no leader of partitions for a short time #1986
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
see https://issues.apache.org/jira/browse/STORM-2394
In our case, there is something wrong with network for a short time. So some partitions of Kafka have no leaders.
The nextTuple of KafkaSpout throw an exception of "No leader found for partition 0" at the position of "_coordinator.refresh();". The exception is from the function getLeaderFor in DynamicBrokersReader.java. So the spout is hanged.
The partitions of Kafka have recover for a short time. But the spout can not deal with this problem. This problem appears several times on our server. Such as:
Feb 25 06:31:19 CST 2017, KafkaSpout threw the exception.
Feb 25 06:31:21 CST 2017, Kafka partitions recoverd.
To be stronger, I think that the "_coordinator.refresh();" can try times. At the last time, throw the exception. Anyway, it will die, why not try one more time?