-
Notifications
You must be signed in to change notification settings - Fork 14.9k
KAFKA-19605; Fix the busy loop occurring in kraft client observers #20354
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
chia7712
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kevin-wu24 thanks for this fix. I ran the patch locally, and the CPU usage has improved. Is it possible to add a unit test for it?
|
@chia7712 thanks for pointing out the issue.
I'm not super sure what a unit test would look like, since the backoff logic is not a correctness thing, but rather an efficiency/performance thing. |
It seems to me that the busy loop is a performance issue, as it could lead to high CPU usage. I'm fine with adding a unit test in a follow-up, since the local test looks good. Otherwise, as soon as the broker starts running, my computer's fan spins up, which is a bit alarming. |
jsancio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix @kevin-wu24. Just a minor coding comment.
| if (shouldSendAddOrRemoveVoterRequest()) { | ||
| return Math.min( | ||
| backoffMs, | ||
| state.remainingUpdateVoterSetPeriodMs(currentTimeMs) | ||
| ); | ||
| } else { | ||
| return backoffMs; | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you change the implementation so that shouldSendAddOrRemoveVoterRequest is only evaluated once?
You can also change the back off computation to something like:
return Math.min(
backoffMs,
shouldSendAddOrRemoveVoterRequest ?
state.remainingUpdateVoterSetPeriodMs(currentTimeMs) :
Integer.MAX_VALUE
);|
Hi @chia7712, @jsancio and I had a discussion offline about some of the "backoff" logic in
|
brandboat
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
BTW, now FollowerState#remainingUpdateVoterSetPeriodMs is an unused method, could we remove it?
jsancio
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the fix. LGTM.
| state.remainingFetchTimeMs(currentTimeMs), | ||
| state.remainingUpdateVoterSetPeriodMs(currentTimeMs) | ||
| ) | ||
| state.remainingFetchTimeMs(currentTimeMs) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, the backoffMs accounts for the time to wait before processing the result of any updateVoteRequest
ahuang98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks for the improvements!
|
@chia7712 can you re-trigger the CI? The Java 24 failure doesn't look related. |
| if (sendResult.requestSent()) { | ||
| state.resetUpdateVoterSetPeriod(currentTimeMs); | ||
| } | ||
| return sendResult.timeToWaitMs(); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What happens if timeToWaitMs is larger than the fetch timeout? Could the observer miss a fetch request?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The observer does not need to consider the time left on the fetch timer when calculating the backoff, because an observer cannot transition to prospective/candidate state. It must transition to follower state first.
What happens if timeToWaitMs is larger than the fetch timeout?
If this is the case, the observer would have to wait timeToWaitMs anyways so its request manager doesn't have a pending request. Only then can it resume fetching/sending add/remove voter.


The broker observer should not read update voter set timer value when
polling to determine its backoff, since brokers cannot auto-join the
KRaft voter set. If auto-join or kraft.version=1 is not supported,
controller observers should not read this timer either when polling.
The updateVoterSetPeriodMs timer is not something that should be
considered when calculating the backoff returned by polling, since this
timer does not represent the same thing as the fetchTimeMs timer.
Reviewers: Chia-Ping Tsai chia7712@gmail.com, José Armando García
Sancio jsancio@apache.org, Alyssa Huang ahuang@confluent.io,
Kuan-Po Tseng brandboat@gmail.com