-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Short metadata refresh time triggers a refetch of already fetched message despite no cluster changes #4249
Comments
Hi @mensfeld, thanks for the detailed report. Could you tell me if it happens even if you set the sleep to a value different from the metadata refresh, like 1s higher or lower? |
Actually if I set the sleep to a much higher value then the refresh, things work. It looks like the issue is with the first metadata refresh in case messages started to be consumed. |
I did one more thing: I created a separate process to produce messages and just used consumer independently from producer. If I start producing and consuming (separate processes) messages BEFORE the initial first metadata refresh (after 5 seconds), things reprocess. If I wait and only produce (separate process) AFTER the first refresh, works as expected.
Will I get a great report badge? ;) |
I had seen the fetch response close to the
That initial This is independent though, as it can happen in other moments the offset validation is called. |
Probably yes but not an expert :( so won't help much. if patch is available I can however run my rather heavy integrations suite to test it :) |
Hi @mensfeld does version v2.1.1-RC1 solve this issue? I've reproduced the issue with a test, but wanted to confirm. |
@emasab I will be able to confirm you this in around 3-7 days. |
@emasab quick tests show it works as expected. I am not able to test it deeply, though because traveling. However things I could test fast show, the issue is no longer there. great work. |
Hello,
I'm one of the maintainers of rdkafka-ruby bindings and Karafka framework author (https://github.com/karafka/karafka). I wanted to upgrade librdkafka from 2.0.2 to 2.1.0 in the ruby bindings. However, I noticed one issue that prevented me from doing this.
For some scenarios, I use a really short
topic.metadata.refresh.interval.ms
of5_000
ms. This is used mainly in dev to update consumer and producer metadata states upon cluster changes quickly. I noticed that despite no cluster changes and only a few messages being produced, such a short interval causes duplicates when polling almost every time metadata refresh occurs.The flow of the code is as follows:
The code below works as expected with librdkafka 2.0.2, and none of my integration tests fail.
Consumer
#each
just runs a poll and yields the message.Note:
Karafka.producer
is just a wrapped librdkafka that dispatches a message and waits for the delivery callback. Producer metadata refresh is set much higher (100 000 ms)Checklist
Please provide the following information:
2.1.0
2.8.1
and3.4.0
from bitnamiLinux 5.4.0-146-generic #163-Ubuntu SMP Fri Mar 17 18:26:02 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
debug=..
as necessary) from librdkafkaLogs
Bitnami Kafka logs:
librdkafka debug all logs + some basic Karafka logs
Full issue log here: https://gist.github.com/mensfeld/9782da56cffc5fd7293000522e4b6744 as it did not fit into the comment
Exactly the same code under 2.0.2 works as expected and can run for several minutes without any issues. I can reproduce it every single time.
I suspect, that it may be related to the metadata cache eviction flow: v2.0.2...v2.1.0#diff-17c2af7f93fd5ac3f7afc7993bcfaa03bf3cb614e522a8dae82e2b077bcfd3beR163
The text was updated successfully, but these errors were encountered: