Skip to content

Conversation

@lucasbru
Copy link
Member

@lucasbru lucasbru commented Jan 7, 2026

Streams group may not be in the results yet, in that case, wait longer.

I ran it 100 times locally to validate.

Reviewers: Lianet Magrans lmagrans@confluent.io

@lucasbru lucasbru requested review from Copilot and lianetm January 7, 2026 18:24
@github-actions github-actions bot added core Kafka Broker tests Test fixes (including flaky tests) small Small PRs labels Jan 7, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes a flaky test by adding a null check before accessing properties of the group object. The test testDescribeStreamsGroupsNotReady was failing intermittently because it attempted to call methods on a potentially null firstGroup object when the streams group had not yet appeared in the results.

Key Changes

  • Added null check (firstGroup != null) before accessing firstGroup.groupState() to prevent NullPointerException when the streams group has not yet been returned by listGroups()

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

val firstGroup = client.listGroups().all().get().stream()
.filter(g => g.groupId() == streamsGroupId).findFirst().orElse(null)
firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The similar test method testDescribeStreamsGroups (line 4432) has the same potential NullPointerException issue where it checks firstGroup.groupState() without first verifying firstGroup is not null. For consistency and to prevent the same flaky test behavior, it should also be updated to check firstGroup != null before accessing its methods.

Suggested change
firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
if (firstGroup == null) {
false
} else {
firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
}

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeap, would it make sense to apply the fix consistently to the other tests too? (testDescribeStreamsGroups, testDescribeStreamsGroupsForStatelessTopology and testListStreamsGroupOffsets it seems, even though they haven't been flaky so far)

val firstGroup = client.listGroups().all().get().stream()
.filter(g => g.groupId() == streamsGroupId).findFirst().orElse(null)
firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
Copy link

Copilot AI Jan 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The condition check is redundant. Once you verify firstGroup != null, checking firstGroup.groupId() == streamsGroupId is unnecessary because the filter already ensures that g.groupId() == streamsGroupId. Only groups matching this condition can be found by findFirst(). Consider simplifying the condition to just check firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY.

Suggested change
firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY && firstGroup.groupId() == streamsGroupId
firstGroup != null && firstGroup.groupState().orElse(null) == GroupState.NOT_READY

Copilot uses AI. Check for mistakes.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also makes sense to me (only for the tests that filter the stream)


@Flaky("KAFKA-20045")
@Test
def testDescribeStreamsGroupsNotReady(): Unit = {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test has the Flaky tag on trunk, let's remove it with this PR

Streams group may not be in the results yet, in that case,
wait longer.
@lucasbru
Copy link
Member Author

lucasbru commented Jan 7, 2026

@lianetm Thanks for the comments. All addressed

Copy link
Member

@lianetm lianetm left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks! LGTM

@lucasbru lucasbru merged commit 2585a9a into apache:trunk Jan 8, 2026
23 checks passed
lianetm pushed a commit that referenced this pull request Jan 18, 2026
…amsGroupsNotReady (#21317)

### Description
This test was flaky because it assumes the Streams group will reach
`GroupState.NOT_READY`, but depending on timing and environment the
Streams changelog topic could be created successfully. When that
happened, the group progressed to `ASSIGNING/RECONCILING/STABLE`, the
test failed to observe `NOT_READY` within the timeout, and it sometimes
produced Reconciliation failed logs during consumer shutdown due to an
unfinished onTasksAssigned-related event.

This change makes the behavior deterministic by allowing
`createStreamsGroup()` to inject a `replication factor` into
`StreamsRebalanceData.TopicInfo`. In
`testDescribeStreamsGroupsNotReady`, we pass an intentionally impossible
replication factor for the current cluster (e.g., 9999), ensuring the
changelog topic creation attempt always fails. As a result, the internal
topic remains missing and the Streams group reliably stays in
`NOT_READY`, eliminating the timing-dependent state transition that
caused the flakiness.

### Detail Flaky Case and Non Flaky Case
<img width="909" height="925" alt="image"

src="https://github.com/user-attachments/assets/003445aa-5410-4dec-850b-efd714e4ff0b"
/>

### Result
- Fixes

https://develocity.apache.org/scans/tests?search.rootProjectNames=kafka&search.timeZoneId=Asia%2FTaipei&tests.container=kafka.api.PlaintextAdminIntegrationTest&tests.sortField=FLAKY&tests.test=testDescribeStreamsGroupsNotReady()

### Related PR
- #21267

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Lianet Magrans
 <lmagrans@confluent.io>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Kafka Broker small Small PRs tests Test fixes (including flaky tests)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants