KAFKA-8813: Refresh log config if it's updated before initialization#7305
KAFKA-8813: Refresh log config if it's updated before initialization#7305hachikuji merged 6 commits intoapache:trunkfrom
Conversation
|
cc: @hachikuji, @jsancio, @mumrah Maybe useful to review the change by hiding whitespace changes (put |
5d08e04 to
18358f8
Compare
|
Ran create topic, change config in a loop (ran 100 times) and made sure that configs update correctly. Here is the bash command I used to create and change topic: for idx in `seq 100`; do
bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 3 --topic test-topic-${idx}
bin/kafka-configs.sh --alter --entity-type topics --entity-name test-topic-${idx} --add-config 'min.insync.replicas=2' --zookeeper localhost:2181
doneHere is one to check the configuration: for idx in `seq 100`; do
bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test-topic-${idx}
done 2> /dev/null | grep min.insyncHere is to clean things up at the end: for idx in `seq 100`; do
bin/kafka-topics.sh --delete --bootstrap-server localhost:9092 --topic test-topic-${idx}
done |
|
Tested again with 500 topic creation followed by update (instead of 100 in previous comment). Checked to make sure topic configuration is sane. |
hachikuji
left a comment
There was a problem hiding this comment.
Thanks for the patch. Left a couple initial comments.
73e6d55 to
194d6d8
Compare
hachikuji
left a comment
There was a problem hiding this comment.
Sorry for the delay. Left a few more comments.
19ccba2 to
4297f6f
Compare
A partition log in initialized in following steps: 1. Fetch log config from ZK 2. Call LogManager.getOrCreateLog which creates the Log object, then 3. Registers the Log object Step apache#3 enables Configuration update thread to deliver configuration updates to the log. But if any update arrives between step #1 and apache#3 then that update is missed. It breaks following use case: 1. Create a topic with default configuration, and immediately after that 2. Update the configuration of topic There is a race condition here and in random cases update made in seocond step will get dropped. This change fixes it by tracking updates arriving between step #1 and apache#3 Once a Partition is done initialzing log, it checks if it has missed any update. If yes, then the configuration is read from ZK again. Added unit tests to make sure a dirty configuration is refreshed. Tested on local cluster to make sure that topic configuration and updates are handled correctly.
This change takes care of issue where we were holding lock while talking to ZK. The change switches to use a concurrent hash map and retries config load operation if it has been changed since it got refreshed.
Removed the param and checked the config value directly instead.
4297f6f to
8a4a999
Compare
hachikuji
left a comment
There was a problem hiding this comment.
Just a couple more comments.
We took out locks from the config update code, which exposed a condition where log entry can stay in the map if it was removed while an update was already in progress. This change uses the `replace` method of the Map to make sure that we don't insert an entry if it is not already present. Ran unit tests and they pass after this change.
hachikuji
left a comment
There was a problem hiding this comment.
LGTM. Thanks for the patch.
|
retest this please |
…7305) A partition log in initialized in following steps: 1. Fetch log config from ZK 2. Call LogManager.getOrCreateLog which creates the Log object, then 3. Registers the Log object Step #3 enables Configuration update thread to deliver configuration updates to the log. But if any update arrives between step #1 and #3 then that update is missed. It breaks following use case: 1. Create a topic with default configuration, and immediately after that 2. Update the configuration of topic There is a race condition here and in random cases update made in second step will get dropped. This change fixes it by tracking updates arriving between step #1 and #3 Once a Partition is done initializing log, it checks if it has missed any update. If yes, then the configuration is read from ZK again. Added unit tests to make sure a dirty configuration is refreshed. Tested on local cluster to make sure that topic configuration and updates are handled correctly. Reviewers: Jason Gustafson <jason@confluent.io>

A partition log in initialized in following steps:
Step #3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step #1 and #3
then that update is missed. It breaks following use case:
There is a race condition here and in random cases update made in
seocond step will get dropped.
This change fixes it by tracking updates arriving between step #1 and #3
Once a Partition is done initialzing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.
Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.
More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.
Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.
Committer Checklist (excluded from commit message)