Skip to content

Conversation

@kevin-wu24
Copy link
Contributor

@kevin-wu24 kevin-wu24 commented Aug 14, 2025

The broker observer should not read update voter set timer value when
polling to determine its backoff, since brokers cannot auto-join the
KRaft voter set. If auto-join or kraft.version=1 is not supported,
controller observers should not read this timer either when polling.

The updateVoterSetPeriodMs timer is not something that should be
considered when calculating the backoff returned by polling, since this
timer does not represent the same thing as the fetchTimeMs timer.

Reviewers: Chia-Ping Tsai chia7712@gmail.com, José Armando García
Sancio jsancio@apache.org, Alyssa Huang ahuang@confluent.io,
Kuan-Po Tseng brandboat@gmail.com

Copy link
Member

@chia7712 chia7712 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kevin-wu24 thanks for this fix. I ran the patch locally, and the CPU usage has improved. Is it possible to add a unit test for it?

@kevin-wu24
Copy link
Contributor Author

kevin-wu24 commented Aug 14, 2025

@chia7712 thanks for pointing out the issue.

Is it possible to add a unit test for it?

I'm not super sure what a unit test would look like, since the backoff logic is not a correctness thing, but rather an efficiency/performance thing.

@chia7712
Copy link
Member

I'm not super sure what a unit test would look like, since the backoff logic is not a correctness thing, but rather an efficiency/performance thing.

It seems to me that the busy loop is a performance issue, as it could lead to high CPU usage. I'm fine with adding a unit test in a follow-up, since the local test looks good. Otherwise, as soon as the broker starts running, my computer's fan spins up, which is a bit alarming.

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix @kevin-wu24. Just a minor coding comment.

Comment on lines 3369 to 3376
if (shouldSendAddOrRemoveVoterRequest()) {
return Math.min(
backoffMs,
state.remainingUpdateVoterSetPeriodMs(currentTimeMs)
);
} else {
return backoffMs;
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you change the implementation so that shouldSendAddOrRemoveVoterRequest is only evaluated once?

You can also change the back off computation to something like:

    return Math.min(
        backoffMs,
        shouldSendAddOrRemoveVoterRequest ?
            state.remainingUpdateVoterSetPeriodMs(currentTimeMs) :
            Integer.MAX_VALUE
    );

@kevin-wu24
Copy link
Contributor Author

kevin-wu24 commented Aug 14, 2025

Hi @chia7712, @jsancio and I had a discussion offline about some of the "backoff" logic in pollFollowerAsObserver and pollFollowerAsVoter. It is partially related to the bug here, but mainly the problem is that the calculation of the return value for these methods is conceptually incorrect as it is now. Basically, the fetch timeout and update voter set timeout mean two different things conceptually, but the code treats them similarly.

  • When the fetch timeout expires, it means the voter should transition states. This means the fetch timeout is something we should consider when calculating the backoff.
  • The update voter set timeout just means that every X amount of time, the replica should try to send an Add/Remove/UpdateVoterRPC. The value of this timer is not something we should consider when calculating the backoff. If we successfully send one of these RPCs, we should wait for sendResult.timeToWaitMs(). If we don't successfully send one of these RPCs because there is a pending request, we should also wait for sendResult.timeToWaitMs(). The value of this timer does not impact how long we should back off for, only the in-flight request's lifetime.

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Before this patch:
image

After:
image

Profiling around 40 seconds, and the performance improvement is significant, thanks for the fix!

Copy link
Member

@brandboat brandboat left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

BTW, now FollowerState#remainingUpdateVoterSetPeriodMs is an unused method, could we remove it?

Copy link
Member

@jsancio jsancio left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the fix. LGTM.

state.remainingFetchTimeMs(currentTimeMs),
state.remainingUpdateVoterSetPeriodMs(currentTimeMs)
)
state.remainingFetchTimeMs(currentTimeMs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, the backoffMs accounts for the time to wait before processing the result of any updateVoteRequest

Copy link
Contributor

@ahuang98 ahuang98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thanks for the improvements!

@kevin-wu24
Copy link
Contributor Author

@chia7712 can you re-trigger the CI? The Java 24 failure doesn't look related.

@github-actions github-actions bot removed the triage PRs from the community label Aug 15, 2025
if (sendResult.requestSent()) {
state.resetUpdateVoterSetPeriod(currentTimeMs);
}
return sendResult.timeToWaitMs();
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if timeToWaitMs is larger than the fetch timeout? Could the observer miss a fetch request?

Copy link
Contributor Author

@kevin-wu24 kevin-wu24 Aug 15, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The observer does not need to consider the time left on the fetch timer when calculating the backoff, because an observer cannot transition to prospective/candidate state. It must transition to follower state first.

What happens if timeToWaitMs is larger than the fetch timeout?

If this is the case, the observer would have to wait timeToWaitMs anyways so its request manager doesn't have a pending request. Only then can it resume fetching/sending add/remove voter.

@jsancio jsancio changed the title KAFKA-19605: Fix the busy loop occurring in the broker observer KAFKA-19605; Fix the busy loop occurring in kraft client observers Aug 15, 2025
@jsancio jsancio merged commit 833e25f into apache:trunk Aug 15, 2025
41 of 43 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants