-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Help Wanted]Crash on rd_kafka_broker_destroy_final #3608
Comments
By the way, In my program, the |
The same error occurs under kafka V2.5 |
Please provide a reproducible test program, thanks. |
HI @edenhill , sorry that it is hard to reproduce... It just happens occasionally. |
There is not enough information to go on. Please provide:
|
Thanks a lot! I will check my program to see if more info can be provided! |
Hi @edenhill I traced the code and find when calling And if run into this branch: The I am not sure if the Because when calling The application crashed at Is it a possible reason of this problem? |
Hi @edenhill, same here as well, and it's happend when restart program due to the process had not consumed data for a period of time.
|
I'm trying to solve this problem, can you tell me how to reproduce it |
I'm sorry, I have no idea about how to reproduce it. It happens occasionally. |
If thread_1 (rdkafka_broker_thread_main) is executing rd_kafka_metadata_refresh_known_topics (which will select one rbk to fetch topics), and then thread_2 (rdkafka_broker_thread_main which create the rbk ) exit, finally the thread_1 will destroy the rbk, so the check is problematic
|
@edenhill I am also troubled by this problem, can you provide some help? |
I'm guessing this only happens on client termination, |
I've seen it happen once as well. Cannot reproduce though and all I got is:
|
It happens on 2.2.0 via rdkafka-ruby in the context of a short lived consumer without a consumer group assigned. I use it to read data in a one-shot. |
I need to reproduce it with a test, but I think here the problem comes from the assert that checks that the thread that calls destroy final for that broker is the same broker thread. In previous stack trace destroy is called from |
@emasab I was not able to reproduce it but I wonder if running |
@mensfeld probably not, because the metadata refresh can happen independently from subscription, with a producer too. About the fix, it could be removing the assert completely or setting rkb_thread to NULL when thread exits and don't fail the assert in that case. |
Thanks. Two more questions as a followup:
I would love to at least partially mitigate this prior to having a fix (or even a repro). I will try to reproduce it as well. |
@emasab can I mitigate it (at least partially) by checking the statistics age of metadata for brokers and "waiting" out the refresh if the time is too close to it? If I know the frequency (which is user controller) and I do know the age (if this is what statistics publish), putting aside edge cases (cluster changes that would trigger refresh), the periodic one could we "waited out", right? |
@atul-raghuwanshi you can use statistics to figure out the metadata refresh time (putting aside cluster changes ofc) and make sure you do not close around it |
I have same problem, is there anynoe slove this problem? |
We have a customer who has also ran into this problem a few times when using a newer version of our software which uses librdkafka 1.9.2. Unfortunately we haven't been able to reproduce the issue ourselves in our tests. However, the customer had previously been using an older version of our software that was using librdkafka 1.3.0, and hadn't run into it, so we gave them a newer version of our software with that older library and they haven't been able to reproduce it. So it seems that the issue was introduced somewhere between version 1.3.0 and 1.9.2 |
@Long-Wu-code @JSoet we used a force metadata fetch before closing consumer to bypass the issue and that seems to work for our case. SInce then we have not encountered the issue. |
@atul-raghuwanshi |
@atul-raghuwanshi |
We have a customer who has also ran into this problem a few times when using a newer version of our software which uses librdkafka 2.4 , earlier we were using librdkafka 0.11 which didnt had this issue. Unfortunatley We are not able to reproduce the issue yet. Is there a fix yet for this issue ? |
@JSoet were you able to apply the workaround mentioned by @atul-raghuwanshi . Is it working |
@lazern |
@atul-raghuwanshi Just want to re-iterate the question @JSoet asked, about the workaround "metadata fetch and then are able to immediately close? You then don't need to do anything with the statistics and calculating exactly when to close" |
Read the FAQ first: https://github.com/edenhill/librdkafka/wiki/FAQ
Do NOT create issues for questions, use the discussion forum: https://github.com/edenhill/librdkafka/discussions
Description
My program crash with following stack:
How to reproduce
I don't know how to reproduce, my program is a little bit complicated.
And all things go well at most cases. This error only happens occasionally.
I just want to know under what circumstances may the above problem occur so that I can continue to troubleshoot my program.
IMPORTANT: Always try to reproduce the issue on the latest released version (see https://github.com/edenhill/librdkafka/releases), if it can't be reproduced on the latest version the issue has been fixed.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
1.8.0
0.11
enable.partition.eof=false enable.auto.offset.store=false statistics.interval.ms=0 auto.offset.reset=error api.version.request=true api.version.fallback.ms=0
centos7 x86_64
debug=..
as necessary) from librdkafka: No logs, just crash...The text was updated successfully, but these errors were encountered: