-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
rdkafka stopping connection retries on RdKafka::ERR__ALL_BROKERS_DOWN #373
Comments
librdkafka should try connecting to all brokers it knows about forever: What it does do is suppress log messages of failed connection attempts if the error is the same as the last attempt (e.g., Connection refused), so maybe that is what you are seeing (or not seeing), lack of log messages? As soon as brokers start coming back up it should reconnect to them within a couple of seconds. |
if a single broker is going down it reconnects to it like I expect. (reconnecting)
code to produce logs:
|
The librdkafka broker threads should not exit unless the rd_kafka_t handle is is marked for destruction (through rd_kafka_destroy() and rk->rk_terminate). You could set a breakpoint at the last line of rd_kafka_broker_thread_main() to help figure out why it is exiting. |
I was trying to backtrack it with gdb:
|
Ah, it dies due to segmentation fault. Try to figure out how, if gdb doesn't help you try running it with valgrind. |
looks like this segfault is only coming up when trying to debug with record.
|
Check rkb->rkb_rk->rk_terminate |
calling the broker_connect, after that sleeping once.. |
That's very weird, as you see in rd_kafka_broker_thread_main() the only case where it returns is if rk_terminate is set to non-zero. Try running it in valgrind to see if there is some memory corruption going on.
|
thanks helping me to track this down.
|
Glad you found it! |
rdkafka is trying to connect to all brokers in the list, if they all fail I get a event_cb (ERR__ALL_BROKERS_DOWN) but no more connection attempts are done..
I would expect rdkafka to report it but still try to connect to the brokers.
In case you do not think this should be the default behaviour I would suggest a config value for retry after all brokers down...
The text was updated successfully, but these errors were encountered: