-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-20045: Fix flaky test testDescribeStreamsGroupsNotReady #21267
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Pull request overview
This PR fixes a flaky test by adding a null check before accessing properties of the group object. The test testDescribeStreamsGroupsNotReady was failing intermittently because it attempted to call methods on a potentially null firstGroup object when the streams group had not yet appeared in the results.
Key Changes
- Added null check (
firstGroup != null) before accessingfirstGroup.groupState()to prevent NullPointerException when the streams group has not yet been returned bylistGroups()
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| val firstGroup = client.listGroups().all().get().stream() | ||
| .filter(g => g.groupId() == streamsGroupId).findFirst().orElse(null) | ||
| firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId | ||
| firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The similar test method testDescribeStreamsGroups (line 4432) has the same potential NullPointerException issue where it checks firstGroup.groupState() without first verifying firstGroup is not null. For consistency and to prevent the same flaky test behavior, it should also be updated to check firstGroup != null before accessing its methods.
| firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId | |
| if (firstGroup == null) { | |
| false | |
| } else { | |
| firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId | |
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeap, would it make sense to apply the fix consistently to the other tests too? (testDescribeStreamsGroups, testDescribeStreamsGroupsForStatelessTopology and testListStreamsGroupOffsets it seems, even though they haven't been flaky so far)
| val firstGroup = client.listGroups().all().get().stream() | ||
| .filter(g => g.groupId() == streamsGroupId).findFirst().orElse(null) | ||
| firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId | ||
| firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId |
Copilot
AI
Jan 7, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The condition check is redundant. Once you verify firstGroup != null, checking firstGroup.groupId() == streamsGroupId is unnecessary because the filter already ensures that g.groupId() == streamsGroupId. Only groups matching this condition can be found by findFirst(). Consider simplifying the condition to just check firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY.
| firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId | |
| firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also makes sense to me (only for the tests that filter the stream)
|
|
||
| @Flaky("KAFKA-20045") | ||
| @Test | ||
| def testDescribeStreamsGroupsNotReady(): Unit = { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this test has the Flaky tag on trunk, let's remove it with this PR
Streams group may not be in the results yet, in that case, wait longer.
|
@lianetm Thanks for the comments. All addressed |
lianetm
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! LGTM
…amsGroupsNotReady (#21317) ### Description This test was flaky because it assumes the Streams group will reach `GroupState.NOT_READY`, but depending on timing and environment the Streams changelog topic could be created successfully. When that happened, the group progressed to `ASSIGNING/RECONCILING/STABLE`, the test failed to observe `NOT_READY` within the timeout, and it sometimes produced Reconciliation failed logs during consumer shutdown due to an unfinished onTasksAssigned-related event. This change makes the behavior deterministic by allowing `createStreamsGroup()` to inject a `replication factor` into `StreamsRebalanceData.TopicInfo`. In `testDescribeStreamsGroupsNotReady`, we pass an intentionally impossible replication factor for the current cluster (e.g., 9999), ensuring the changelog topic creation attempt always fails. As a result, the internal topic remains missing and the Streams group reliably stays in `NOT_READY`, eliminating the timing-dependent state transition that caused the flakiness. ### Detail Flaky Case and Non Flaky Case <img width="909" height="925" alt="image" src="https://github.com/user-attachments/assets/003445aa-5410-4dec-850b-efd714e4ff0b" /> ### Result - Fixes https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Asia%2FTaipei&tests.container=kafka.api.PlaintextAdminIntegrationTest&tests.sortField=FLAKY&tests.test=testDescribeStreamsGroupsNotReady() ### Related PR - #21267 Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
Streams group may not be in the results yet, in that case, wait longer.
I ran it 100 times locally to validate.
Reviewers: Lianet Magrans lmagrans@confluent.io