-
Notifications
You must be signed in to change notification settings - Fork 3.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Important: how do I distinguish between fatal errors and ignorable errors? #64
Comments
Most of the error reports from rdkafka are informational, there isnt much the application can or should do in case of these errors. But the case with all brokers being down is a good example of an error that the application really should know about, so I'll add two new error/resp codes to signal this:
Do you see a use for a third one?:
Supplying a severity 'level' to the error_cb is a good idea. You've highlighted a number of issues with the current API which I will fix for the next SONAME bump. In the meantime I will try to document some workarounds for the application. |
I'd only want to know real errors (it's an error callback not an On Saturday, January 25, 2014, Magnus Edenhill notifications@github.com
|
Fair enough, there should probably be an event_cb for the informational stuff. |
Either that or a generic callback where the subscriber indicates the level On Saturday, January 25, 2014, Magnus Edenhill notifications@github.com
|
You can now monitor your error_cb for err==RD_KAFKA_RESP_ERR__ALL_BROKERS_DOWN. Please verify this on your end. |
let me check this with you. Currently any error callback I shut down the On Sun, Jan 26, 2014 at 8:08 AM, Magnus Edenhill
|
Depends on your application's needs of course, but I would say that most errors reported by librdkafka have the potential of being transient/temporary. That is; a RESOLVE failure could be because of the DNS not being available, Currently the only error codes signaled through the error_cb are:
I guess you could treat all but .._TRANSPORT as fatal. The other error codes are signaled through the delivery report callback (dr_cb) and are message, topic or partition specific. |
Thanks - will use this as a guide. Had a somewhat unrelated question. On Sun, Jan 26, 2014 at 8:22 AM, Magnus Edenhill
|
The initial list of brokers you specify to librdkafka (either through config "metadata.broker.list" or rd_kafka_brokers_add()) are called the bootstrap brokers: rdkafka will connect to each one and retrieve metadata information containing all brokers and topics in the cluster. The brokers learnt through metadata is ADDED to the list of brokers and rdkafka will connect to them aswell. In your example that means the final list of brokers in rdkafka would be: host1, host2, badhost4, host3 The bootstrap broker connections are never used for producing or consuming messages, only used for metadata, since they cant be reliably mapped to a specific broker instance. |
Got it. thanks. On Sun, Jan 26, 2014 at 9:09 AM, Magnus Edenhill
|
Magnus- on a single broker not being reachable (I intentionally put in wrong port for one broker). I'm getting a -196 (ERR_FAIL). I though that would come as -195 (TRANSPORT)? |
That should now be fixed. |
Also note that I added ..__BAD_MSG to the list of error_cb() error codes in the comment above. |
It's interesting- I temporarily changed it to pass only one broker host On Monday, January 27, 2014, Magnus Edenhill notifications@github.com
|
Good point, the order is now reversed. |
I'm not sure this is working after the last update. I pass it a list of hosts (all valid hosts but kafka not running on them). I get a single callback saying that one of the hosts is unreachable, and no error callbacks thereafter. I expected:
However, if I pass in a single host, and that host is not reachable, I get the 2 callbacks as above. |
Slight update on the above. I do get all the callbacks, but only AFTER I attempt to produce something. As you know, I'd like it to fail immediately, which it seems to do if only a single broker is provided. Also - when you run rdkafka_example as a consumer - if you pick a partition that has no messages (so it hangs) and then Ctrl-C it - it tells you that all brokers are down... |
rdkafka_example in producer mode is a bad example in this regard because it runs in a single thread that is blocked by fgets(stdin), and thus wont poll error_cb's (et.al) until enter is pressed or program is aborted. With rdkafka_performance in consume mode I think it works correctly:
|
I didn't use example to test, was using my code. On Tuesday, January 28, 2014, Magnus Edenhill notifications@github.com
|
And the key is that it should fail without having to produce a message. The On Tuesday, January 28, 2014, Dan Hoffman hoffmandan@gmail.com wrote:
|
I cant reproduce this with rdkafka_example, telling it to not send any messages (-c 0) and to idle (-I):
Are you sure that you call rd_kafka_poll() regularily even when not producing messages? |
define 'regularly' in this context? How many times do I need to call it after setting the broker list? I would assume only once. (Given that this is a callback mechanism, it's weird that I have to call it at all...) |
So, an application that has registered at least one callback (error_cb, dr_cb, ..) must call But typically it is polled where it makes sense, after a (bunch of) produce() calls, or simply from a main dispatcher loop. It really depends on what suits the application best. Also, the application must not put assumptions on when a specific type of error callback is estimated to arrive on error, i.e.:
This wont work if the host resolving of the brokers takes longer than 5s, or if the connection is slow. Instead design along these lines:
|
Why wouldn't the library simply call the callback only on the threads that On Wednesday, January 29, 2014, Magnus Edenhill notifications@github.com
|
It is if you call rd_kafka_poll() from that thread :). |
Ok... To me though if I always have to make a call to fire a callback On Wednesday, January 29, 2014, Magnus Edenhill notifications@github.com
|
I've 'lost' (damn autocorrect) On Wednesday, January 29, 2014, Dan Hoffman hoffmandan@gmail.com wrote:
|
Its impossible for rdkafka to spontaneously call a callback in an application thread since rdkafka does not run any code in those threads unless the application calls an rdkafka function. Internally there's a queue of ops (error, dr, ..) that the rdkafka threads enqueue when things happen.
But I like the simple and isolated callback way. |
Any way for me to get the queue count or otherwise get callbacks for all On Wednesday, January 29, 2014, Magnus Edenhill notifications@github.com
|
rd_kafka_poll() will serve all ops on the queue in one go. |
Ok. Then in my case it must be as you stated- that my poll call is On Wednesday, January 29, 2014, Magnus Edenhill notifications@github.com
|
Yeah, makes sense. Keep polling! :) |
I guess this issue has been resolved, right? |
I have a 3 node cluster. If I bring one if the nodes down, I get an error callback that one of the node can't be reached. I consider this non fatal since the other nodes are up and viable. Can you either not callback unless all nodes are down, or provide a means of knowing the severity?
The text was updated successfully, but these errors were encountered: