-
Notifications
You must be signed in to change notification settings - Fork 1.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support multiple record batches, closes #1022 #1023
Conversation
With added logging: diff --git a/consumer.go b/consumer.go
index 276bc94..bb850f3 100644
--- a/consumer.go
+++ b/consumer.go
@@ -3,6 +3,7 @@ package sarama
import (
"errors"
"fmt"
+ "log"
"sync"
"sync/atomic"
"time"
@@ -619,6 +620,8 @@ func (child *partitionConsumer) parseResponse(response *FetchResponse) ([]*Consu
}
}
+ log.Printf("Returning %d messages from %d batches", len(messages), len(block.Records.recordBatchSet.batches))
+
return messages, nil
} I see multiple batches on the consumer as expected (1MB buffer):
This producer is not great at batching, so numbers match here. You can see big number of batches after restart and them smaller batches as consumer is up to date with production rate. On another consumer with higher throughput and better batching we see this:
|
While this fix addresses the issue of multiple record batches being in one response, there could be a mix of record batches and legacy message sets (see issue #1021). Fixing that issue needs to rework how records are implemented, because they aren't just either legacy or new anymore, and the ordering between record batches and legacy messages needs to be preserved. |
That's a great point. I updated the PR to support mixed scenarios. Tested with 0.10.2.0 and 0.11.0.0 API with the following log:
I've tried both full (1KB covers everything in this log) and short (128B) reads, both work as expected. |
f32a1ad
to
b2db613
Compare
fetch_response.go
Outdated
b.RecordsSet = []*Records{} | ||
|
||
for { | ||
if recordsDecoder.remaining() == 0 { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think for recordsDecoder.remaining() > 0 {
would be more idiomatic
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That would require one more level of nesting, I think it's better to break early and keep it flat:
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not sure I follow; moving the condition into the for
loop that already exists wouldn't add any layers of nesting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
My bad, I misread the first time. Code is changed now.
record_batch.go
Outdated
@@ -35,6 +35,47 @@ func (e recordsArray) decode(pd packetDecoder) error { | |||
return nil | |||
} | |||
|
|||
type RecordBatchSet struct { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
we don't seem to use this type anywhere
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, this is a leftover from the first attempt. Removed.
records.go
Outdated
magicOffset = 16 | ||
magicLength = 1 | ||
) | ||
|
||
// Records implements a union type containing either a RecordBatch or a legacy MessageSet. | ||
type Records struct { | ||
recordsType int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The reason why I put this field is to make it easier to debug possible inconsistencies between what the Records should contain and what it actually contains. If we want to refactor this, we could do it in a separate PR because I don't think it's related to the problem at hand (unless I'm missing something).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough. I built on top of my initial changes, that's why many things were missing. I added another commit to minimize the diff with master.
b2db613
to
c8284bb
Compare
@@ -31,7 +33,8 @@ type FetchResponseBlock struct { | |||
HighWaterMarkOffset int64 | |||
LastStableOffset int64 | |||
AbortedTransactions []*AbortedTransaction | |||
Records Records |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Technically it's a breaking change to remove this; could you leave it with a comment that it's deprecated, and just fill it in with the first set or something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done, let me know if that's what you had in mind.
This looks pretty good to me, just a few comments. Thanks for all your work on this both of you! |
Thanks! |
cc @wladh