-
Notifications
You must be signed in to change notification settings - Fork 397
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Application doesn't finish gracefully after client.disconnect #5
Comments
The only thing I can think that is doing this is that The reason I ask is that we have a test-case that is running a pretty similar piece of code: https://github.com/Blizzard/node-rdkafka/blob/master/e2e/producer.spec.js#L113 which is passing CI. |
The 'disconnect' was reported on the console, so the callback is called, and it's called pretty fast.
I think the reason why it's passing is because you are explicitly calling For us it's pretty important since we run tests in |
Has |
Oh, sorry, wrong indentation on my previous comment..
The 'disconnect' was reported on the console, so the callback is called, and it's called pretty fast. |
Looks like this happens regardless of whether a message is produced or not. In fact, just instantiating the class makes it do this. The only RdKafka objects created at that phase are the configs and their requisite callbacks. I can try to play around and see if storing Persistent v8 handles of objects rather than rdkafka configs until |
Was able to identify it as the |
New branch with a fix in place that should make it shutdown gracefully: https://github.com/Blizzard/node-rdkafka/tree/graceful-shutdown. Currently running some tests on it but feel free to try it and let me know if that fixed the problem for you. |
@webmakersteve It definitely makes things different, but: This test made it coredump: "use strict";
var kafka = require('./lib');
var consumer = new kafka.KafkaConsumer({
'metadata.broker.list': 'localhost:9092',
'group.id': 'test'
});
consumer.connect(undefined, function() {
console.log('connected');
consumer.subscribe([ 'test_dc.resource_change' ]);
consumer.disconnect();
}); But after I've updated the branch and rebuilt it once again I can't reproduce any more, but here's a dump (a relevant part of it). Just FYI, if it rings some bells
|
Can you try one more time. I noticed that RIGHT after i said it and updated the branch. If you re-pulled afterwards it would make sense that it didn't happen again.
|
@webmakersteve checked again, no "use strict";
var kafka = require('./lib');
var consumer = new kafka.KafkaConsumer({
'metadata.broker.list': 'localhost:9092',
'group.id': 'test'
}, {
'auto.offset.reset': 'smallest'
});
consumer.connect(undefined, function() {
console.log('connected');
consumer.subscribe([ 'test_dc.resource_change' ]);
consumer.consume(function (err, msg) {
console.log(err, msg);
console.log('Calling disconnect');
consumer.disconnect(function(err, info) {
console.log('Diconnected', err, info);
});
});
}); Important thing is that I have some messages in the topic. So for some reason after I consume something, disconnect never happens and the callback is never called. I've tried waiting for like 5 minutes on this one, I know that RdKafka::Handle::close might be super slow, but not as slow. GDB shows some interesting stuff - we have 8 threads, as always thread 1 is event loop, 2-4 are libuv thread pull, threads 6 and 7 are in |
In this case, does it log https://github.com/Blizzard/node-rdkafka/blob/master/src/message.cc#L103 The message is |
@webmakersteve Nope, in this case it doesn't log
So, I guess that's the root cause of this problem... And there's not much we can do I guess - performance is way more critical then graceful shutdown. Lemme play with it a bit |
You can try playing with |
Interestingly enough, even deallocating the buffer forceably does not cause var Kafka = require('../librdkafka');
var consumer = new Kafka.KafkaConsumer({
'client.id': 'kafka-test',
'metadata.broker.list': 'localhost:9092',
'group.id': 'test'
}, {});
consumer.connect(function(err) {
console.log(err);
consumer.assign([
{
topic: 'test',
partition: 0,
offset: 3
}
]);
function disconnect() {
consumer.disconnect(function() {
console.log('disconnected');
});
}
var msg = consumer.consume(function(err, msg) {
console.log(msg);
});
setInterval(() => global.gc(), 1000);
setTimeout(() => consumer.disconnect(function() {}), 2200);
}); Added some logging in the Buffer::Free callback to make sure it was getting called, and even when I could verify that the Perhaps this is a library problem? |
I've created confluentinc/librdkafka#775 to understand what does Magnus think about this problem. |
Just poking! I'm really hoping to deploy something into production in the next few months that will be blocked by this bug. Any way I can help? |
I looked into implementing Magnus' solution to the problem but I can't change where the pointer of data points after the fact by detecting disconnections as far as I can tell because the instantiated I looked into potentially having a solution where you could opt into memory copying instead if disconnections were more important than performance, but it looks like the rebalance callback actually may be one of the things stopping the graceful shutdown of the wrapper. I think using the default |
Hm, for my purposes I don't need any rebalance callback, as I'm using |
There are two issues that stop I haven't found a solution I really like that works with how Would be happy to take contributions for this if you have available time. |
This issue should be fixed in #42 which has to rely on the Unfortunately I decided philosophically that performance is not more important than the bindings being just that: bindings. A performance degradation is inherently necessary when making bindings for another language in order to subscribe to that language's way of doing things. There is no way for me to adjust memory in use by node after it has been returned to the v8 thread, and buffer data is slab allocated which makes it read only (essentially) once I have instantiated the buffer. Would love additional thoughts on this issue though. I am trying to bump up the performance in other ways to compensate for the new |
Been doing some testing. I can reproduce this issue in version 0.3.3. Using master, with #42 merged, my process finishes properly after calling So! Looking good to me! |
Fixed in release 0.6.0 |
I expect that after a producer/consumer is disconnected it should destroy all RdKafka objects and let the application exit, however it doesn't happen.
Code example:
Versions: kafkaOS X El Capitan
Expected behaviour: application sends one message to the topic and shuts down.
Actual behaviour: application sends one message to the topic but continues running.
A bit of investigation:
gdb
shows that we have 8 threads running afterdisconnect
was called:Threads 1-5 are normal, so whatever prevents shutdown is in threads 6-8, I suspect it's on thread 6, here's the backtrace:
Any ideas where could the source of a problem be?
The text was updated successfully, but these errors were encountered: