Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Client silently stops receiving after NodeNotReadyError #1572

Closed
ghost opened this issue Aug 13, 2018 · 2 comments
Closed

Client silently stops receiving after NodeNotReadyError #1572

ghost opened this issue Aug 13, 2018 · 2 comments

Comments

@ghost
Copy link

ghost commented Aug 13, 2018

Below is a pseudo-code that illustrates the problem:

def consume_forever(consumer):
    for message in consumer.receive():
        yield message
    consumer.close()

def consume_some(consumer, condition):
    for message in consume_forever(consumer):
        if condition(message):
            break
    else:
        raise RuntimeError('Should not happen')

The Should not happen error is, in fact raised. In the logs, I can see this:

base.py                    317 ERROR    Error sending OffsetCommitRequest_v2 to node 4 [NodeNotReadyError: 4]

Expected Result

I would expect that in case consumer decides to stop consuming it raises a relevant error, that allows the user code to adjust its operation accordingly, or, even better, deal with it internally, using some configurable policies, s.a. number of retries, or retry until deadline etc. It doesn't make sense to stop the consumer unless it is specifically requested by the user code, or an error condition occurs.

@ghost
Copy link
Author

ghost commented Aug 13, 2018

OK, nevermind. Turns out someone used consumer_timeout_ms, and its meaning is different from expected: I hoped that the timeout would be on network operations, which are then either retried or raise exceptions.

So, now, since the original request no longer reflects the reality, can I ask you to remove this feature?

The motivation is similar to what I wrote earlier: there is no plausible explanation for why a client should stop receiving messages. Any situation when that happens is an error. Don't misunderstand me: users of the client may chose to stop receiving messages (simply by breaking out of the loop), but the client itself should not make such decisions on their behalf.

Think about it the same way as you would think about operations on files: from filesystem, or from kernel's perspective, there is no plausible explanation for why would you like read() operation to succeed when the file was not found. Users of read(), however, may chose to ignore this error. It would be a really bad idea if read() was configurable to sometimes silently ignore errors. History, actually knows instance of when this did happen: it was possible once to mount NFS shares specifying "soft" as mount mode: in this situation, a share could be unmounted while I/O is still in progress. Eventually, NFS Linux client developers realized how bad this idea was, and removed this feature altogether, without replacement.

@ghost
Copy link
Author

ghost commented Aug 14, 2018

Reading further into documentation, I can see that this is barely a drop in a bucket of ideas that... are probably not yours, but Kafka's developers. All these timeouts make this system a fruitful ground for SW consultancy agencies, and, perhaps, only for them.

Nevermind my previous request for removing this particular timeout setting. It won't change the overall picture, even if you do it. But, you probably won't do it anyways, because, why bother trying to put a small patch on the otherwise totally broken system?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

0 participants