Skip to content

KAFKA-14122: Fix flaky test DynamicBrokerReconfigurationTest#testKeyStoreAlter#12452

Merged
mimaison merged 1 commit intoapache:trunkfrom
divijvaidya:alter-keystore
Aug 2, 2022
Merged

KAFKA-14122: Fix flaky test DynamicBrokerReconfigurationTest#testKeyStoreAlter#12452
mimaison merged 1 commit intoapache:trunkfrom
divijvaidya:alter-keystore

Conversation

@divijvaidya
Copy link
Member

Background

At the beginning of the test, we create a producer (say P1) & consumer (say C1) (with enable_auto_commit=true, auto.commit.interval.ms = 5000 and groupId="group1"). P1 and C1 continuously write and read messages throughout the test and at the end we assert that all messages were received and no duplicates were received.

During the execution of the test, we create more producers and consumers as we dynamically change configuration and assert sanity of produce/consumer operation. When creating a new consumer (say C2), we create it in the same consumer group as C1, i.e. groupId="group1".

Problem

When C2 is created, it triggers a rebalance within the consumer group "group1" which already has C1. Thus, consumption of C1 is disrupted. Since C1 uses enable_auto_commit=true, there is a possibility that it reads duplicate messages after rebalance. When it reads duplicate messages after rebalance, it causes the test to fail.

Solution

Do not disturb the operation of C1 when introducing C2 by creating C2 with a separate group "group2". This fixes the flakiness of the test.

@divijvaidya
Copy link
Member Author

The tests are failing due to:

[2022-07-28T10:29:22.348Z] Execution failed for task ':streams:streams-scala:compileScala'.

[2022-07-28T10:29:22.348Z] > Timeout waiting to lock zinc-1.6.1_2.12.15_8 compiler cache (/home/jenkins/.gradle/caches/7.5/zinc-1.6.1_2.12.15_8). It is currently in use by another Gradle instance.

[2022-07-28T10:29:22.348Z]   Owner PID: 11362

[2022-07-28T10:29:22.348Z]   Our PID: 11365

[2022-07-28T10:29:22.348Z]   Owner Operation: 

[2022-07-28T10:29:22.348Z]   Our operation: 

[2022-07-28T10:29:22.348Z]   Lock file: /home/jenkins/.gradle/caches/7.5/zinc-1.6.1_2.12.15_8/zinc-1.6.1_2.12.15_8.lock

which is unrelated to the change and associated with build infra.

@divijvaidya
Copy link
Member Author

@showuon since you are the flaky test expert here :) can you please take a look at this one. Thanks!

@divijvaidya
Copy link
Member Author

@mimaison perhaps you would like to review this one?

@divijvaidya
Copy link
Member Author

@guozhangwang please take a look into this flaky test fix if you get a chance.

@showuon
Copy link
Member

showuon commented Aug 2, 2022

Sorry, was quite busy recently. I'll take a look this week. Thanks for helping fix it!

Copy link
Member

@mimaison mimaison left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, this test has been really flaky recently! LGTM

@mimaison mimaison merged commit 78038bc into apache:trunk Aug 2, 2022
@showuon
Copy link
Member

showuon commented Aug 3, 2022

Thanks @divijvaidya for fixing it! Thanks for @mimaison 's review!

@divijvaidya divijvaidya deleted the alter-keystore branch August 3, 2022 09:20
ijuma added a commit to confluentinc/kafka that referenced this pull request Aug 5, 2022
…(5 August 2022)

Version related conflicts:
* Jenkinsfile
* gradle.properties
* streams/quickstart/java/pom.xml
* streams/quickstart/java/src/main/resources/archetype-resources/pom.xml
* streams/quickstart/pom.xml
* tests/kafkatest/__init__.py
* tests/kafkatest/version.py

* commit 'add7cd85baa61cd0e1430': (66 commits)
KAFKA-14136 Generate ConfigRecord for brokers even if the value is
unchanged (apache#12483)
  HOTFIX / KAFKA-14130: Reduce RackAwarenesssTest to unit Test (apache#12476)
  MINOR: Remove ARM/PowerPC builds from Jenkinsfile (apache#12380)
  KAFKA-14111 Fix sensitive dynamic broker configs in KRaft (apache#12455)
  KAFKA-13877: Fix flakiness in RackAwarenessIntegrationTest (apache#12468)
KAFKA-14129: KRaft must check manual assignments for createTopics are
contiguous (apache#12467)
KAFKA-13546: Do not fail connector validation if default topic
creation group is explicitly specified (apache#11615)
KAFKA-14122: Fix flaky test
DynamicBrokerReconfigurationTest#testKeyStoreAlter (apache#12452)
  MINOR; Use right enum value for broker registration change (apache#12236)
  MINOR; Synchronize access to snapshots' TreeMap (apache#12464)
  MINOR; Bump trunk to 3.4.0-SNAPSHOT (apache#12463)
  MINOR: Stop logging 404s at ERROR level in Connect
KAFKA-14095: Improve handling of sync offset failures in MirrorMaker
(apache#12432)
  Minor: enable index for emit final sliding window (apache#12461)
  MINOR: convert some more junit tests to support KRaft (apache#12456)
  KAFKA-14108: Ensure both JUnit 4 and JUnit 5 tests run (apache#12441)
  MINOR: Remove code of removed metric (apache#12453)
MINOR: Update comment on verifyTaskGenerationAndOwnership method in
DistributedHerder
KAFKA-14012: Add warning to closeQuietly documentation about method
references of null objects (apache#12321)
  MINOR: Fix static mock usage in ThreadMetricsTest (apache#12454)
  ...
mimaison pushed a commit to mimaison/kafka that referenced this pull request Aug 11, 2022
…toreAlter (apache#12452)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
mimaison pushed a commit that referenced this pull request Aug 25, 2022
…toreAlter (#12452)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
jsancio pushed a commit that referenced this pull request Sep 1, 2022
…toreAlter (#12452)


Reviewers: Mickael Maison <mickael.maison@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants