-
Notifications
You must be signed in to change notification settings - Fork 139
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parallel Consumer does not get records from poll after OFFSET_OUT_OF_RANGE in auto.offset.reset = earliest #352
Comments
Are your offsets in your consumer group topic being expired? the default retention for offsets used to be 24 hours. Once the offsets are gone, it should start from the beginning like there was never any offsets committed. I'm not quite following what you're describing. Are you able to write this as an integration test? You can copy the latest version of tests using |
Thanks @astubbs for your help. Really appreciate it! I am working on the integration tests and it might take time to reproduce thru tests. But in the meantime, the offsets in my consumer group for the topic are expired and I expect the same that it will start from earliest available new offset. I can see that log as below but it did not fetch any records. However when I start spring kafka using the same consumer group, I see the same logs as parallel consumer but it does fetch records with new offset returned by kafka.
|
Hi @astubbs I have tested it further and it looks like the parallel consumer is getting records. The issue I see is that the current offset in confluent cloud > consumer lag for it's group does not move when the parallel consumer instance failed to commit the processed records due to the partition was reassigned. What could be the reason behind not moving the current offset in confluent cloud even if I see the records are being processed from such partition ? |
Hi I suspect this is because PC doesn't know about "reset to earliest", so the underlying consumer re-downloads the records, but PC has already seen them so ignores them. If the offset gets reset and the consumer starts from earliest again, do you want PC to also reprocess everything? Was PC stopped, then offsets expired, then PC started again? Or did the offsets get expires while PC was running? |
Hi @astubbs thanks for response. Yes, I want at least once processing behavior. If last committed offset is reset and system is down for 24hrs then it will reprocess everything from earliest available event (no deleted from topic) instead of process only latest events. Although this should rarely happen in prod but I still don't want to loose events using offset reset to latest. When I looked at my logs again, it looks like when there are some processed offsets in local to be committed to broker but due to rebalance after scaling or some other reason it was not committed then next time PC somehow knows that it was processed even though the offset commit was failed and don't reprocess but then on it always commits the last uncommitted offset which is already moved on. I have attached the last commit failure from partitions 0 - 4 and that's the offsets I see on confluent cloud as current offset. (unable to upload confluent cloud screen print)
Below are some additional logs. Please let me know if you want the committedMetadata hash.
|
At least once? Are you saying PC has been skipping records? |
Hi @astubbs actually sorry for confusion but I did not mean PC is skipping records. I meant if I use The issue I am seeing is once the PC is unable to commit the offset due to rebalancing or other reason then the current offset does not move. What I mean by does not move is I see that uncommitted offset in confluent cloud consumer dashboard. However that record is not being processed as duplicated. |
Hi @astubbs just an update on this. This happens when the last committed offset is out of range. So out of range check during start up and reset the offsets before starting PC seems to solve this problem.
|
I’m afraid this all might be been confused by issue #365 (Received duplicated records when rebalance occurs), which is fixed in release 0.5.2.1 (latest is 0.5.2.2). What is the retention period of the input topic?
That statement doesn’t compile -
That was the regression bug #365 :/ Sorry about that.
Just to check - you know you can increase the offset expired time to whatever you want? It used to default to 24 hours, now it's 7 days, but a lot of people max it infinite and only delete them on request (in production). And if you want this system, as per normal consumer setup, just set the policy as you have to earliest.
Ah yes, probably because an earlier offset commit’s metadata included the status of the completed records (in other words they were committed earlier in a metadata commit) - see https://github.com/confluentinc/parallel-consumer#offset_map for more information.
Yes, that’s unrelated, as above.
Yeah its probably the issue as above, but as mentioned above, offsets may have been previously marked as completed in the metadata from older commits.
Why do you set this to one?
Do you have this in a repo i can see?
This code seems odd, how is Why do you think there would be a problem? Generally - Have a play with the latest version and let me know if you still see any issues. |
[maintainer note]:
May have been a symptom of #365
Discussed in #351
Originally posted by jikeshdpatel July 14, 2022
Hi @astubbs , this is a great library and improves the performance drastically. However, I am seeing some strange issue. I feel like it is because of the
auto_offset_reset = earliest
.auto_offset_reset = earliest
and it started consuming from beginning.auto_offset_reset = earliest
) for partition 3 and it started consumingauto_offset_reset = earliest
) and it now consumes from partitions 5 and 3auto_offset_reset = latest
and now it starts consuming from all the partitions.The processing order was based on
KEY
.Can you please advise on how should I further debug it ?
Thank you for your help!
Below are the config:
Below are logs : OFFSET_OUT_OF_RANGE partition, it does reset the offset but does not get any records. However, there are records with the new offset value.
The text was updated successfully, but these errors were encountered: