-
Notifications
You must be signed in to change notification settings - Fork 15k
KAFKA-20074: Fix flaky PlaintextAdminIntegrationTest#testDescribeStreamsGroupsNotReady #21317
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
…amsGroupsNotReady.
|
@lucasbru @lianetm Hi! When you have bandwidth, could you please take a look this PR? I’d really appreciate your feedback, especially on whether this approach matches the intent of the test and the Streams group state machine. Thanks for taking time! 🙇♂️ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a flaky test by making the behavior deterministic. The test testDescribeStreamsGroupsNotReady was intermittently failing because it expected a Streams group to reach GroupState.NOT_READY, but depending on timing, the changelog topic could be successfully created, causing the group to transition to other states.
Changes:
- Added a
replicationFactorparameter to thecreateStreamsGroupmethod inIntegrationTestHarness.scala - Updated
testDescribeStreamsGroupsNotReadyto pass an impossible replication factor (9999) to ensure changelog topic creation fails - The changelog topic creation failure ensures the group reliably stays in
NOT_READYstate
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| core/src/test/scala/integration/kafka/api/IntegrationTestHarness.scala | Added optional replicationFactor parameter to createStreamsGroup method and wired it into TopicInfo for changelog topics |
| core/src/test/scala/integration/kafka/api/PlaintextAdminIntegrationTest.scala | Updated test to pass an impossible replication factor (9999) to deterministically trigger NOT_READY state |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
|
|
||
| val boxed: Optional[java.lang.Short] = | ||
| replicationFactor.map[java.lang.Short](s => java.lang.Short.valueOf(s)) | ||
|
|
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is trailing whitespace on line 257 after the closing parenthesis. This should be removed to maintain code cleanliness and adhere to standard formatting practices.
| changelogTopics = Set(testTopicName + "-changelog"), | ||
| streamsGroupId = streamsGroupId | ||
| streamsGroupId = streamsGroupId, | ||
| replicationFactor = Optional.of(unavailableReplicationFactorInThisCluster), |
Copilot
AI
Jan 16, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There is a trailing comma after the last parameter in the function call. While this is syntactically valid in Scala, it's inconsistent with the other usages of createStreamsGroup in the same file (e.g., lines 4421-4425, 4516-4519) where no trailing comma is used. Consider removing it for consistency.
| replicationFactor = Optional.of(unavailableReplicationFactorInThisCluster), | |
| replicationFactor = Optional.of(unavailableReplicationFactorInThisCluster) |
|
Ah, this makes sense to me. So my earlier fix was just avoiding the NPE, but we'd still time out. Check the copilot comments, otherwise LGTM. |
|
@lucasbru |
lianetm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice! LGTM too, thanks!
|
I'm so sorry 🙇♂️ . |
Description
This test was flaky because it assumes the Streams group will reach
GroupState.NOT_READY, but depending on timing and environment theStreams changelog topic could be created successfully. When that
happened, the group progressed to
ASSIGNING/RECONCILING/STABLE, thetest failed to observe
NOT_READYwithin the timeout, and it sometimesproduced Reconciliation failed logs during consumer shutdown due to an
unfinished onTasksAssigned-related event.
This change makes the behavior deterministic by allowing
createStreamsGroup()to inject areplication factorintoStreamsRebalanceData.TopicInfo. IntestDescribeStreamsGroupsNotReady, we pass an intentionally impossiblereplication factor for the current cluster (e.g., 9999), ensuring the
changelog topic creation attempt always fails. As a result, the internal
topic remains missing and the Streams group reliably stays in
NOT_READY, eliminating the timing-dependent state transition thatcaused the flakiness.
Detail Flaky Case and Non Flaky Case
Result
https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Asia%2FTaipei&tests.container=kafka.api.PlaintextAdminIntegrationTest&tests.sortField=FLAKY&tests.test=testDescribeStreamsGroupsNotReady()
Related PR
Reviewers: Lucas Brutschy lbrutschy@confluent.io, Lianet Magrans
lmagrans@confluent.io