Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

influxdb cluster: tcp.Mux: handler not registered: 71 #5846

Closed
keksior opened this issue Feb 26, 2016 · 15 comments
Closed

influxdb cluster: tcp.Mux: handler not registered: 71 #5846

keksior opened this issue Feb 26, 2016 · 15 comments

Comments

@keksior
Copy link

keksior commented Feb 26, 2016

I have a problem with setting up a influxdb cluster. I did everything from the manual and on the node0 i've got error in logs:

[tcp] 2016/02/26 09:55:00 tcp.Mux: handler not registered: 71

And on my joining node1:

[metaclient] 2016/02/26 09:57:59 failure getting snapshot from influx0.gem.lan:8088: Get http://influx0.gem.lan:8088?index=0: read tcp 10.33.44.158:8088: connection reset by peer

node0 (main) listenings:
tcp 0 0 10.33.44.158:8083 0.0.0.0:* LISTEN 10231/influxd
tcp 0 0 10.33.44.158:8088 0.0.0.0:* LISTEN 10231/influxd
tcp 0 0 10.33.44.158:8091 0.0.0.0:* LISTEN 10231/influxd
tcp6 0 0 :::8086 :::* LISTEN 10231/influxd

Tried to delete everything from /var/lib/influxdb/ on both servers, didn't worked.

My node1 /etc/default/influxdb:

INFLUXD_OPTS="-join 10.33.44.158:8088"

Please help me solving it out.

@e-dard
Copy link
Contributor

e-dard commented Feb 26, 2016

Hi @keksior,

Can you provide more information, including steps to reproduce? How many nodes? How have you configured them? What version of Influx?

@keksior
Copy link
Author

keksior commented Feb 26, 2016

Ubuntu 14.04.03, influx 0.10.1-1. 2 nodes: node0,node1. I configured them like this:

node0:
/etc/influxdb/influxdb.conf
http://pastebin.com/fLmyDx5y

node1:
/etc/influxdb/influxdb.conf
http://pastebin.com/4SgW3hqz

on the node1 i also added into file /etc/default/influxdb this:
INFLUXD_OPTS="-join 10.33.44.158:8088"

@e-dard
Copy link
Contributor

e-dard commented Feb 26, 2016

@keksior you need to join to the meta node's HTTP service. Port 8088 is used for internal node communication. Nodes use the HTTP service to discover these ports and hosts.

So, when starting node1 (assuming you already started node0), change the env var to: INFLUXD_OPTS="-join 10.33.44.158:8091".

@keksior
Copy link
Author

keksior commented Feb 26, 2016

Now it added my node1 to meta_nodes, but not the data_nodes:

data_nodes
id http_addr tcp_addr
1 "localhost:8086" "10.33.44.158:8088"
meta_nodes
id http_addr tcp_addr
1 "10.33.44.158:8091" "10.33.44.158:8088"
2 "10.33.44.167:8091" "10.33.44.167:8088"

@e-dard
Copy link
Contributor

e-dard commented Feb 26, 2016

Can you provide the logs for both nodes please?

@keksior
Copy link
Author

keksior commented Feb 26, 2016

@keksior
Copy link
Author

keksior commented Mar 2, 2016

Can someone help me solving it out?

@keksior
Copy link
Author

keksior commented Mar 2, 2016

I found that setting ":8086" in config make localhost in "show servers" data_nodes. When i changed it to ip address and cleared on node0 /var/lib/influxdb/meta/* it worked everything. Thanks for help.

@keksior keksior closed this as completed Mar 2, 2016
@keksior keksior reopened this Mar 2, 2016
@keksior
Copy link
Author

keksior commented Mar 2, 2016

I still have a problem. I've got whole cluster running:

data_nodes
id http_addr tcp_addr
1 "10.33.44.158:8086" "10.33.44.158:8088"
3 "10.33.44.167:8086" "10.33.44.167:8088"
meta_nodes
id http_addr tcp_addr
1 "10.33.44.158:8091" "10.33.44.158:8088"
2 "10.33.44.167:8091" "10.33.44.167:8088"

retention:
name duration replicaN default
default "0" 2 true

But the data is not send to the secondary server. How can i debug this?

@e-dard
Copy link
Contributor

e-dard commented Mar 2, 2016

Hi @keksior,

Are you saying you're writing to 10.33.44.158:8086 and the data is not propagating to 10.33.44.167:8086? How are you verifying data is not being written to the second node?

@keksior
Copy link
Author

keksior commented Mar 3, 2016

I checked disk usage:

node0:
root@influx0:~# du -h --max-depth=1 /var/lib/influxdb/data
7.1G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data/_internal
8.5G /var/lib/influxdb/data

node1:
root@influx1:~# du -h --max-depth=1 /var/lib/influxdb/data
408K /var/lib/influxdb/data/_internal
1.4G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data

@e-dard
Copy link
Contributor

e-dard commented Mar 3, 2016

Can you reproduce this from startup with a fresh cluster?

If not can you provide the influx logs for a period where you know writes
have come in and not been propagated to the other node?

On Thursday, 3 March 2016, keksior notifications@github.com wrote:

I checked disk usage:

node0:
root@influx0:~# du -h --max-depth=1 /var/lib/influxdb/data
7.1G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data/_internal
8.5G /var/lib/influxdb/data

node1:
root@influx1:~# du -h --max-depth=1 /var/lib/influxdb/data
408K /var/lib/influxdb/data/_internal
1.4G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data


Reply to this email directly or view it on GitHub
#5846 (comment)
.

Edd Robinson
InfluxDB Engineer
▼▴

InfluxData.com http://influxdata.com/

Github http://www.github.com/e-dard / LinkedIn
https://www.linkedin.com/in/eddrobinson / Personal http://eddrobinson.io

@keksior
Copy link
Author

keksior commented Mar 3, 2016

I have data coming in to my influx every few seconds. The logs for influx0:
[http] 2016/03/03 09:54:08 10.21.1.6 - telegraf [03/Mar/2016:09:54:08 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7da0659c-e11d-11e5-8f99-000000000000 15.159911ms
[http] 2016/03/03 09:54:09 10.21.0.207 - telegraf [03/Mar/2016:09:54:09 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7e45acef-e11d-11e5-8f9a-000000000000 36.631127ms
[http] 2016/03/03 09:54:10 10.21.0.110 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7ed5115d-e11d-11e5-8f9b-000000000000 38.597214ms
[http] 2016/03/03 09:54:10 10.21.1.73 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7edcca3a-e11d-11e5-8f9c-000000000000 8.666437ms
[http] 2016/03/03 09:54:10 10.21.0.164 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7ef6d669-e11d-11e5-8f9d-000000000000 24.970879ms
[http] 2016/03/03 09:54:10 10.21.0.179 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7f126f4e-e11d-11e5-8f9e-000000000000 42.305598ms
[http] 2016/03/03 09:54:11 10.21.0.192 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fb33837-e11d-11e5-8f9f-000000000000 35.105411ms
[http] 2016/03/03 09:54:11 10.21.0.168 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fc7f7e2-e11d-11e5-8fa0-000000000000 24.820672ms
[http] 2016/03/03 09:54:11 10.33.44.171 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fd6cbe2-e11d-11e5-8fa1-000000000000 14.126165ms
[http] 2016/03/03 09:54:12 10.21.1.106 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fd84234-e11d-11e5-8fa2-000000000000 16.669366ms

And the last logs from the node1:
[tsm1] 2016/03/03 09:52:57 beginning level 3 compaction of group 0, 4 TSM files
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000372-000000003.tsm (#0)
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000376-000000003.tsm (#1)
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000380-000000003.tsm (#2)
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000384-000000003.tsm (#3)
[tsm1] 2016/03/03 09:53:28 compacted level 3 group (0) into /var/lib/influxdb/data/telegraf/default/3/000000384-000000004.tsm.tmp (#0)
[tsm1] 2016/03/03 09:53:28 compacted level 3 group 0 of 4 files into 1 files in 30.971793057s
[tsm1] 2016/03/03 09:53:29 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/03/03 09:53:29 compacting full group (0) /var/lib/influxdb/data/telegraf/default/3/000000368-000000005.tsm (#0)
[tsm1] 2016/03/03 09:53:29 compacting full group (0) /var/lib/influxdb/data/telegraf/default/3/000000384-000000004.tsm (#1)

I have nothing more than tsm1 logs in my node1 log file.

@keksior
Copy link
Author

keksior commented Mar 7, 2016

I cleared /var/lib/influxdb on both nodes and started from the start:

node0:
a] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:45 +0100] POST /execute HTTP/1.1 200 28 - Go 1.1 package http aaabde20-e448-11e5-80c2-000000000000 66.541714ms
[meta] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:35 +0100] GET /?index=13 HTTP/1.1 200 228 - Go 1.1 package http a4b5c12f-e448-11e5-8083-000000000000 10.072341825s
[meta] 2016/03/07 10:40:45 10.33.44.158 - - [07/Mar/2016:10:40:35 +0100] GET /?index=13 HTTP/1.1 200 228 - Go 1.1 package http a4b5689e-e448-11e5-8082-000000000000 10.080732241s
[meta] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:45 +0100] GET /?index=14 HTTP/1.1 200 228 - Go 1.1 package http aab6e010-e448-11e5-80c3-000000000000 23.442601ms
[meta] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:45 +0100] POST /execute HTTP/1.1 200 28 - Go 1.1 package http aab6e45c-e448-11e5-80c4-000000000000 23.332742ms
[meta] 2016/03/07 10:40:45 10.33.44.158 - - [07/Mar/2016:10:40:45 +0100] GET /?index=14 HTTP/1.1 200 228 - Go 1.1 package http aab7b23a-e448-11e5-80c5-000000000000 18.252287ms
[cluster] 2016/03/07 10:40:45 accept remote connection from 10.33.44.167:59460

node1:
[snapshot] 2016/03/07 10:40:35 Starting snapshot service
[copier] 2016/03/07 10:40:35 Starting copier service
[admin] 2016/03/07 10:40:35 Starting admin service
[admin] 2016/03/07 10:40:35 Listening on HTTP: [::]:8083
[continuous_querier] 2016/03/07 10:40:35 Starting continuous query service
[httpd] 2016/03/07 10:40:35 Starting HTTP service
[httpd] 2016/03/07 10:40:35 Authentication enabled: false
[httpd] 2016/03/07 10:40:35 Listening on HTTP: 10.33.44.167:8086
[retention] 2016/03/07 10:40:35 Starting retention policy enforcement service with check interval of 30m0s
[run] 2016/03/07 10:40:35 Listening for signals
2016/03/07 10:40:45 updated node metaservers with: [10.33.44.158:8091 10.33.44.167:8091]

The problem still exist. Any ideas?

@jsternberg
Copy link
Contributor

This is from an old version and we no longer support clustering in the open source version, but in case you're still trying to figure this out, you're probably using the wrong port somewhere in your configuration file. This error message happens when you try to make an HTTP (8086) request against the RPC port (8088). I can't remember which configuration option it was exactly, but that would be a good place to start with.

I'm going to close this as it's an issue we're not going to fix (the clustering code is dramatically different from what 0.10 had), but I hope this helps a bit with trying to figure out your configuration issue. I'm sorry we were not able to get to responding to this issue in a more timely manner.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants