Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #449

Closed
m2studio opened this issue Aug 18, 2016 · 7 comments
Closed

FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110] #449

m2studio opened this issue Aug 18, 2016 · 7 comments

Comments

@m2studio
Copy link

m2studio commented Aug 18, 2016

I have a process to consumer messages from Kafka 24/7 and use forever on top to ensure my process will never stop and 100% alive.

But once I let my process run a few hours, I found that it stopped to consumer message (my process is still running but somehow there is no message consuming). Then I found these errors in my log file
FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110]
FailedToRegisterConsumerError: Consumer not registered prerender-live-group_186636ca-3ccb-4660-9931-6f4160e71fdb
FailedToRegisterConsumerError: Path wasn't created

my code (if you want more detail to investigate please let me know)
this.client = new kafka.Client(this.zookeeperServer); this.consumer = new kafka.HighLevelConsumer(this.client, [{ topic: this.topicName }], this.options); this.consumer.on('message', this.onMessage.bind(this)); this.consumer.on('error', this.onError.bind(this));

I have tried to find the root cause since last 2 weeks. But I'm still not able to solve it, even I get the latest version (0.5.5).

Please help me, I need the process to survive all day all night.

my configuration
"options": { "autoCommit": true, "autoCommitMsgCount": 1, "autoCommitIntervalMs": 300, "fromOffset": false, "groupId": "new-oad-group", "fromBeginning": false, "fetchMaxWaitMs": 100, "fetchMaxBytes": 1048576, "maxTickMessages": 100, "encoding": "utf8" }

Machine info
OS ==> 2.6.32-573.26.1.el6.centos.plus.x86_64
CPU(s) 8
MemTotal: 15300380 kB

Last test

  • Run the process on 4 machines (each machine use the same code, same configuration)
  • The process has 2 consumers to consume from 2 topics
    -- first topic has 80 partitions
    -- second topic has 6 partitions
@hyperlink
Copy link
Collaborator

Looks like it failed to register the consumer with zookeeper.

Run your process with env DEBUG=kafka-node:* it will output a lot of debug information that may help.

@gogorush
Copy link

I had the same issue. The doc says it is the problem of FailedToRebalanceConsumerError: Exception: NODE_EXISTS[-110]

@m2studio How did u start your program with forever? In my case, I started 20 instances of node threads with a shell script and random error occur within those 20 instances. But if I started those instances with a few time delayed for each threads would solve this problem.

I read about debug logs with kafka-node:* turned on. Seems kafka would rebalance partitions when ever a new thread joins into the party. So that if we started threads within a short amount of time. Rebalance could lead to an error.

The following are some debug logs:
1 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:zookeeper Node: /consumers/kafka-node-group/ids/kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 was created.
2 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer rebalance() kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 is rebalancing: false ready: false
3 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer Deregistered listeners kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30
4 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer Registered listeners kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30
5 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 is attempting to rebalance
6 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 stopping data read during rebalance
7 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 assembling data for rebalance
8 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:zookeeper Children are: ["kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30","kafka-node-group_9e9d3ed2-5066-48ec-888c-f1373672ceb3","kafka-node-group_f887c141-3221-47ca-9b3d-f5dfc8736a06","kafka-node-group_92146a20-8662-42>
9 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 releasing current partitions during rebalance
10 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 determining the partitions to own during rebalance
11 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer consumerPerTopicMap.consumerTopicMap {"kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30":["lbs.location"],"kafka-node-group_9e9d3ed2-5066-48ec-888c-f1373672ceb3":["lbs.location"],"kafka-node-group_f887c14>
12 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer newTopicPayloads [{"topic":"lbs.location","partition":"50","offset":0,"maxBytes":5242880,"metadata":"m"},{"topic":"lbs.location","partition":"51","offset":0,"maxBytes":5242880,"metadata":"m"},{"topic":"lbs.loc>
13 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 gaining ownership of partitions during rebalance
14 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 rebalance attempt failed
15 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 is attempting to rebalance
16 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 stopping data read during rebalance
17 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 assembling data for rebalance
18 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:zookeeper Children are: ["kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30","kafka-node-group_9e9d3ed2-5066-48ec-888c-f1373672ceb3","kafka-node-group_f887c141-3221-47ca-9b3d-f5dfc8736a06","kafka-node-group_92146a20-8662-42>
19 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 releasing current partitions during rebalance
20 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 determining the partitions to own during rebalance
21 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer consumerPerTopicMap.consumerTopicMap {"kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30":["lbs.location"],"kafka-node-group_9e9d3ed2-5066-48ec-888c-f1373672ceb3":["lbs.location"],"kafka-node-group_f887c14>
22 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer newTopicPayloads [{"topic":"lbs.location","partition":"50","offset":0,"maxBytes":5242880,"metadata":"m"},{"topic":"lbs.location","partition":"51","offset":0,"maxBytes":5242880,"metadata":"m"},{"topic":"lbs.loc>
23 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 gaining ownership of partitions during rebalance
24 Wed, 26 Oct 2016 11:18:13 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 rebalance attempt failed
25 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 is attempting to rebalance
26 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 stopping data read during rebalance
27 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 assembling data for rebalance
28 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Children are: ["kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30","kafka-node-group_9e9d3ed2-5066-48ec-888c-f1373672ceb3","kafka-node-group_f887c141-3221-47ca-9b3d-f5dfc8736a06","kafka-node-group_92146a20-8662-42>
29 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 releasing current partitions during rebalance
30 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 determining the partitions to own during rebalance
31 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer consumerPerTopicMap.consumerTopicMap {"kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30":["lbs.location"],"kafka-node-group_9e9d3ed2-5066-48ec-888c-f1373672ceb3":["lbs.location"],"kafka-node-group_f887c14>
32 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer newTopicPayloads [{"topic":"lbs.location","partition":"50","offset":0,"maxBytes":5242880,"metadata":"m"},{"topic":"lbs.location","partition":"51","offset":0,"maxBytes":5242880,"metadata":"m"},{"topic":"lbs.loc>
33 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:HighLevelConsumer HighLevelConsumer kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30 gaining ownership of partitions during rebalance
34 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/50 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
35 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/51 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
36 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/52 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
37 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/53 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
38 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/54 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
39 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/55 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
40 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/56 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
41 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/57 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
42 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/58 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
43 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/59 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
44 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/60 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
45 Wed, 26 Oct 2016 11:18:14 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/61 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
46 Wed, 26 Oct 2016 11:18:15 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/62 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
47 Wed, 26 Oct 2016 11:18:15 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/63 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.
48 Wed, 26 Oct 2016 11:18:15 GMT kafka-node:zookeeper Gained ownership of /consumers/kafka-node-group/owners/lbs.location/64 by kafka-node-group_c3ae7253-b964-4d35-b5c5-d2eb90276d30.

Hope this could help you.

@hyperlink
Copy link
Collaborator

A stack trace of the NODE_EXISTS[-110] error may help isolate the issue.

Also have you tried using the new Consumer Group? It's very similar to HLC and should not suffer from these NODE_EXISTS issues.

@gogorush
Copy link

@hyperlink My kafka version is 0.8.2 so that I can't use Consumer Group. Gonna have a deep look at this issue later.

@s-rodriguez
Copy link

Hi @m2studio @gogorush, have you found any solution regarding this issue?

We are currently facing the same problem in a prod environment, and we are struggling to find a way to solve it.
We have a topic with 15 partitions, and there are 30+ processes that can be selected as consumers. However, from time to time, partitions are left without owners.

Any hint towards where to look into would be appreciated

@gogorush
Copy link

This is caused by the rebalance issue when more consumers added into the network. I upgraded my kafka to 0.9+ and use the new ConsumerGroup API to avoid this issue.

I checked this issue earlier, conflunt(the company behind kafka) posted an article about their new consumer in 0.9 said and I quote 'This new consumer also adds a set of protocols for managing fault-tolerant groups of consumer processes. Previously this functionality was implemented with a thick Java client (that interacted heavily with Zookeeper). The complexity of this logic made it hard to build fully featured consumers in other languages'. SO I guess Javascript is one of the languages. And for many other reasons, I strongly recommend you to upgrade your kafka to 0.9+ too.

Check our more on this link

Hope this could help you @s-rodriguez .

@s-rodriguez
Copy link

Wow, thanks a lot @gogorush! I'll definitely check out kafka 0.9+ with the new ConsumerGroup

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants