-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Seek method returning incorrect messages on compressed topic when using max_poll_records #1214
Comments
On some debugging I found that, the fetch_offset over here is the given start_offset (100) and so skips the actual offset 79 as expected in the first iteration. But in second iteration the fetch_offset (part.offset) is same as the current record.offset (80 in above case) and henceforth it is not skipped. |
Can you post debug logs ? Are you sure you aren't accidentally joining a consumer group and fetching the last-commit offsets? |
Got it -- thanks. The issue here is that compressed messages (at least for v0 and v1 format) are actually returned as single message "batches" in the MessageSet format. The batch offset is equal to the last message in the compressed batch. When seeking to an offset that is in a compressed batch we first register the outer batch message offset, which is the offset of the last message inside the batch. When we take uncompressed messages from inside the batch, the individual messages have smaller offsets than the outer batch offset. The |
While using seek method of `kafka.consumer.group.seek' for a given partition, offset, we are seeing the inconsistent behavior for the messages returned with the subsequent poll method.
The issue is easily reproducible for the given topic (compacted).
Part of Workflow:
Observation:
If iterator interface is used instead of poll interface, the issue no longer exists. My guess is somewhere while polling for messages, the fetched offsets are not updated or fetched messages are not skipped. It looks like iterator method is not using fetched_records api that's why it works fine.
At times it does give correct messages (especially when given offset is closer to highwatermark)
Please let me know if any other details are required.
The text was updated successfully, but these errors were encountered: