MINOR: Fix time comparison with appendLingerMs in CoordinatorRuntime#maybeFlushCurrentBatch#20739
Conversation
…maybeFlushCurrentBatch
| private void maybeFlushCurrentBatch(long currentTimeMs) { | ||
| if (currentBatch != null) { | ||
| if (currentBatch.builder.isTransactional() || (currentBatch.appendTimeMs - currentTimeMs) >= appendLingerMs || !currentBatch.builder.hasRoomFor(0)) { | ||
| if (currentBatch.builder.isTransactional() || (currentTimeMs - currentBatch.appendTimeMs) >= appendLingerMs || !currentBatch.builder.hasRoomFor(0)) { |
There was a problem hiding this comment.
Personally, I would prefer to remove this condition, as a time-based task already exists. This approach would also mean we are not changing the current behavior, since the condition was previously a no-op
There was a problem hiding this comment.
As you mentioned, time-based scheduled tasks will ensure that the current batch is flushed to the log once it has passed the append linger time.
Perhaps performing a conditional check here is intended to detect as promptly as possible when the current batch has passed the append linger time, so that flushing to the log can be carried out more promptly.
There was a problem hiding this comment.
Could you add a unit test here? Since the existing tests don't update the mock time, this bug was not disclosed
There was a problem hiding this comment.
Good catch! I think that we can keep the condition here. I agree with adding a unit test.
There was a problem hiding this comment.
Thank you all for your feedback and suggestions. I will add a unit test to cover this scenario.
|
Hello @AndrewJSchofield and @dajac , if you have time, could you please take a look at this issue? Thank you very much, and I look forward to your suggestions. |
This is much more in @dajac's area than mine. |
chia7712
left a comment
There was a problem hiding this comment.
@majialoong thanks for this patch. overall LGTM.
| assertEquals(2, schedulerTimer.size()); | ||
|
|
||
| // Advance past the linger time. | ||
| clockTimer.advanceClock(11); |
There was a problem hiding this comment.
Could you move this to line#4921? It makes more sense there, since the goal of advancing the clockTimer is to ensure flushCurrentBatch is executed during writing #2, right?
There was a problem hiding this comment.
Yes, the purpose of advancing the clockTimer is to ensure that flushCurrentBatch is executed when writing #2.
However, I think that after advancing the clockTimer, we should check the number of tasks in the schedulerTimer to ensure that the linger task still exists (has not been executed or canceled) before writing #2.
This ensures that flushCurrentBatch is executed when writing #2, rather than being triggered by the linger timer task.
|
I will merge this tomorrow if @dajac and @AndrewJSchofield have no objections. I will also backport it to 4.1 and 4.0. |
dajac
left a comment
There was a problem hiding this comment.
Thanks for the fix, @majialoong! I left a few nits. Otherwise, LGTM.
| MockPartitionWriter writer = new MockPartitionWriter(); | ||
|
|
||
| CoordinatorRuntime<MockCoordinatorShard, String> runtime = | ||
| new CoordinatorRuntime.Builder<MockCoordinatorShard, String>() |
There was a problem hiding this comment.
nit: We use four spaces to indent code.
There was a problem hiding this comment.
Thanks! I’ve addressed this.
| "write#1", TP, Duration.ofMillis(20), | ||
| state -> new CoordinatorResult<>(List.of("record1"), "response1") |
| "write#2", TP, Duration.ofMillis(20), | ||
| state -> new CoordinatorResult<>(List.of("record2"), "response2") |
…maybeFlushCurrentBatch (#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
|
@majialoong could you please file a PR for 4.0? I got following error during backport |
Hi @chia7712 . These errors are caused by code differences between the 4.0 branch and trunk. I have submitted a PR (20773) to backport this fix. |
…maybeFlushCurrentBatch (apache#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…maybeFlushCurrentBatch (apache#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…maybeFlushCurrentBatch (#20773) This PR fixed the time comparison logic in CoordinatorRuntime#maybeFlushCurrentBatch to ensure that the batch is flushed when the elapsed time since appendTimeMs exceeds the appendLingerMs parameter. This issue is also mentioned [here](https://github.com/apache/kafka/pull/20653/files#r2442452104). The fix for this issue was originally in [this PR](#20739) in the trunk branch, which was backported to the 4.0 branch. Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
…maybeFlushCurrentBatch (apache#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…maybeFlushCurrentBatch (apache#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
…maybeFlushCurrentBatch (apache#20739) This PR fixed the time comparison logic in `CoordinatorRuntime#maybeFlushCurrentBatch` to ensure that the batch is flushed when the elapsed time since `appendTimeMs` exceeds the `appendLingerMs` parameter. This issue is also mentioned [here]( https://github.com/apache/kafka/pull/20653/files#r2442452104). Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This PR fixed the time comparison logic in
CoordinatorRuntime#maybeFlushCurrentBatchto ensure that the batch isflushed when the elapsed time since
appendTimeMsexceeds theappendLingerMsparameter.This issue is also mentioned here.
Reviewers: David Jacot djacot@confluent.io, Chia-Ping Tsai
chia7712@gmail.com