-
-
Notifications
You must be signed in to change notification settings - Fork 531
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Consumer offset "stuck" on certain partitions #1072
Comments
Can you check the size of the message at offset |
Thank you for the very quick answer. The messages are AVRO encoded. I already tried the following settings for my consumer: |
I think there is still maybe a problem with log compaction and offsets? Here are the offsets of partition 1 in order as they appear: I played around with the offsets a little bit and the next offset that would work is "484197". I'm not sure if this is helpful or not. But there isn't really a message at this offset. The next message (like mentioned above) that KafkaJS received is the offset 484207. |
I tested two different frameworks (C# and Java) and both were able to consume past this offset (484158) with the same configuration. As I mentioned above KafkaJS receives zero messages from Kafka when it tries to fetch messages with this offset (484158). I'm not sure but maybe the other frameworks also receive zero messages but if they do they then check the highest offset of this partition and if it is higher than the current fetch-offset they just skip this offset that receives zero messages? |
This feels incredibly familiar, like we worked on this bug before, but maybe I'm just having dejavu. Related issues from other clients:
EDIT: I knew I had worked on this! #577 Guess there's some other case that's not covered by that. |
@Nevon Do you need any further details to find the cause of this bug? |
I'm not sure I will have the time to look into this myself, but a way to reproduce the issue would be great. The relevant topic configuration would be a start, but ideally a script that creates a topic and produces messages in whatever way is needed to trigger the bug would be 💯 |
I have a similar problem. Last offset successfully consumed = 221235053, next offset = 221306740. diference of offsets between 2 messages is ~70k But the consumer is stuck and does not consume further and constantly tries to fetch 221235053 offset and gets no messages. I have to define ridiculously high I think there should be a check if this batch is empty, but not the last by using offsetApi or by checking if fetchApi returned |
Hi, We have what seems to be a similar issue. One of our partition is stuck at the same offset for our three consumer groups. It also is a compacted topic and using How could we help resolving the issue ? Do you know any workaround other than moving the offset manually ? Thanks ! |
Hi, We too have had the same issue twice this week. Each time one or two partitions from a 3-partitions topic were stuck at the same offset for all our consumer groups (This topic is compacted too). Do you know if someone made progress on this issue? Is there a way we can help solve it? Thanks in advance |
Like I mentioned a year ago, a way to consistently reproduce the issue is the best way to resolve it. Ideally a fork with a failing test, but even just a script that creates a topic and produces to it with whatever parameters are required to trigger the bug, and then a consumer that gets stuck, would be helpful. |
Is there any consensus that this IS related to compaction? We're seeing something similar, using 2.1.0. But this is not in a compacted topic, it does however also have 3 partitions. |
can you try reducing the max.poll.records or increasing the max.poll.interval.ms. |
I will look into it, thanks. |
A quick question @dhruvrathore93 if either of these were the problem, wouldn't we be seeing a rebalance of the consumer group by the broker? |
One more follow up, does kafakJs support setting max.poll.orecord.size ? |
Any updates on this? |
Observed behavior
I consume messages from a topic and this topic has 24 partitions. I started to consume from beginning and at first everything was fine but after some time the consumer stopped consuming messages from certain partitions.
The issue itself is very similar to this issue (562) but I'm using the current version of KafkaJS (v1.15.0) so I'm at a loss what the problem could be. As far as I'm aware the topic also uses log compaction.
I wrote a simple partition assigner that I programmed to only consume from partitions that were "stuck". After that I added some
console.log
messages into the KafkaJS code (consumerGroup.js) to debug the problem further. I came to the point that I always got zero messages in the response frombroker.fetch
.This was the response:
The offset that was used to fetch the next messages was like this:
{ MyTopic: { '1': '484158' } }
There are clearly still messages to consume but it always fetches zero because always the offset 484158 is used. I changed the offset manually via the admin interface to a higher and valid offset and after that the consumer worked again.
Expected behavior
I would expect to receive all messages until the latest offset.
Environment:
Additional context
If further logs are needed I can provide them. I couldn't see any useful debug messages for this problem....
The text was updated successfully, but these errors were encountered: