Skip to content

KAFKA-8813: Refresh log config if it's updated before initialization#7305

Merged
hachikuji merged 6 commits intoapache:trunkfrom
soondenana:KAFKA-8813
Oct 15, 2019
Merged

KAFKA-8813: Refresh log config if it's updated before initialization#7305
hachikuji merged 6 commits intoapache:trunkfrom
soondenana:KAFKA-8813

Conversation

@soondenana
Copy link
Contributor

A partition log in initialized in following steps:

  1. Fetch log config from ZK
  2. Call LogManager.getOrCreateLog which creates the Log object, then
  3. Registers the Log object

Step #3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step #1 and #3
then that update is missed. It breaks following use case:

  1. Create a topic with default configuration, and immediately after that
  2. Update the configuration of topic

There is a race condition here and in random cases update made in
seocond step will get dropped.

This change fixes it by tracking updates arriving between step #1 and #3
Once a Partition is done initialzing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.

More detailed description of your change,
if necessary. The PR title and PR message become
the squashed commit message, so use a separate
comment to ping reviewers.

Summary of testing strategy (including rationale)
for the feature or bug fix. Unit and/or integration
tests are expected for any behaviour change and
system tests should be considered for larger changes.

Committer Checklist (excluded from commit message)

  • Verify design and implementation
  • Verify test coverage and CI build status
  • Verify documentation (including upgrade notes)

@soondenana
Copy link
Contributor Author

soondenana commented Sep 6, 2019

cc: @hachikuji, @jsancio, @mumrah

Maybe useful to review the change by hiding whitespace changes (put ?w=1 to the end of url or use the UI)
HideWhitespace

@soondenana soondenana force-pushed the KAFKA-8813 branch 2 times, most recently from 5d08e04 to 18358f8 Compare September 6, 2019 01:20
@soondenana soondenana changed the title KAFKA-8813: Refresh log config if it's updated before initializaiton KAFKA-8813: Refresh log config if it's updated before initialization Sep 9, 2019
@soondenana
Copy link
Contributor Author

Ran create topic, change config in a loop (ran 100 times) and made sure that configs update correctly. Here is the bash command I used to create and change topic:

for idx in `seq 100`; do 
    bin/kafka-topics.sh --create --bootstrap-server localhost:9092 --replication-factor 3 --partitions 3 --topic test-topic-${idx}
    bin/kafka-configs.sh --alter --entity-type topics --entity-name test-topic-${idx} --add-config 'min.insync.replicas=2' --zookeeper localhost:2181
done

Here is one to check the configuration:

for idx in `seq 100`; do 
    bin/kafka-topics.sh --describe --bootstrap-server localhost:9092 --topic test-topic-${idx}
done 2> /dev/null | grep min.insync

Here is to clean things up at the end:

for idx in `seq 100`; do 
    bin/kafka-topics.sh --delete --bootstrap-server localhost:9092  --topic test-topic-${idx}
done

@soondenana
Copy link
Contributor Author

Tested again with 500 topic creation followed by update (instead of 100 in previous comment). Checked to make sure topic configuration is sane.

Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the patch. Left a couple initial comments.

@soondenana soondenana force-pushed the KAFKA-8813 branch 2 times, most recently from 73e6d55 to 194d6d8 Compare September 16, 2019 19:56
Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for the delay. Left a few more comments.

@soondenana soondenana force-pushed the KAFKA-8813 branch 2 times, most recently from 19ccba2 to 4297f6f Compare October 9, 2019 18:05
A partition log in initialized in following steps:

1. Fetch log config from ZK
2. Call LogManager.getOrCreateLog which creates the Log object, then
3. Registers the Log object

Step apache#3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step #1 and apache#3
then that update is missed. It breaks following use case:

1. Create a topic with default configuration, and immediately after that
2. Update the configuration of topic

There is a race condition here and in random cases update made in
seocond step will get dropped.

This change fixes it by tracking updates arriving between step #1 and apache#3
Once a Partition is done initialzing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.
This change takes care of issue where we were holding lock while talking
to ZK. The change switches to use a concurrent hash map and retries
config load operation if it has been changed since it got refreshed.
Removed the param and checked the config value directly instead.
Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a couple more comments.

We took out locks from the config update code, which exposed a condition
where log entry can stay in the map if it was removed while an update
was already in progress. This change uses the `replace` method of the
Map to make sure that we don't insert an entry if it is not already
present.

Ran unit tests and they pass after this change.
Copy link
Contributor

@hachikuji hachikuji left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. Thanks for the patch.

@soondenana
Copy link
Contributor Author

retest this please

@hachikuji hachikuji merged commit 176c934 into apache:trunk Oct 15, 2019
hachikuji pushed a commit that referenced this pull request Oct 15, 2019
…7305)

A partition log in initialized in following steps:

1. Fetch log config from ZK
2. Call LogManager.getOrCreateLog which creates the Log object, then
3. Registers the Log object

Step #3 enables Configuration update thread to deliver configuration
updates to the log. But if any update arrives between step #1 and #3
then that update is missed. It breaks following use case:

1. Create a topic with default configuration, and immediately after that
2. Update the configuration of topic

There is a race condition here and in random cases update made in
second step will get dropped.

This change fixes it by tracking updates arriving between step #1 and #3
Once a Partition is done initializing log, it checks if it has missed any
update. If yes, then the configuration is read from ZK again.

Added unit tests to make sure a dirty configuration is refreshed. Tested
on local cluster to make sure that topic configuration and updates are
handled correctly.

Reviewers: Jason Gustafson <jason@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants