-
Notifications
You must be signed in to change notification settings - Fork 3.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ThreadSanitizer: data race + hang in rd_kafka_destroy (or rd_kafka_destroy_flags) #4811
Open
6 tasks done
Comments
Additionally have the following log of refcounts:
But I am not very good aware of librdkafka internals to understand which refcounts were not decremented. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Read the FAQ first: https://github.com/confluentinc/librdkafka/wiki/FAQ
Do NOT create issues for questions, use the discussion forum: https://github.com/confluentinc/librdkafka/discussions
Description
I was investigating hang on destroy that specifically reproduces on x86 platform (cannot reproduce with Mac OS/Linux arch64/M2).
How to reproduce
But on x86, it is good reproducible with the following scenario:
Better reproducible with rebalance enabled and assign called from different thread, i.e.:
The only thing that I've found that is pretty good reproducible on x86 platform is the following race (which I don't see on arm):
It is better reproducible with v2.3.0 and less with v2.5.0. With 2.3.0 I can reproduce it without external rebalance and with 2.5.0 with custom rebalance it is much better reproducible.
Reproducing this scenario pretty good with swift wrapper.
There is a code in swift that can catch this race:
As the only difference between platforms is this race catched by TSan, I suspect that this might be a problem for future client destroy method.
Remark: I was trying to use kafka_destroy and kafka_destroy_flags with RD_KAFKA_DESTROY_F_NO_CONSUMER_CLOSE but works the same.
Checklist
IMPORTANT: We will close issues where the checklist has not been completed.
Please provide the following information:
debug=..
as necessary) from librdkafkaLast logs before hang is
While rebalance assign(NULL) is called further, it is ignored by librdkafka and seems it is the reason for hanging
The text was updated successfully, but these errors were encountered: