influxdb cluster: tcp.Mux: handler not registered: 71 #5846

keksior · 2016-02-26T09:00:46Z

I have a problem with setting up a influxdb cluster. I did everything from the manual and on the node0 i've got error in logs:

[tcp] 2016/02/26 09:55:00 tcp.Mux: handler not registered: 71

And on my joining node1:

[metaclient] 2016/02/26 09:57:59 failure getting snapshot from influx0.gem.lan:8088: Get http://influx0.gem.lan:8088?index=0: read tcp 10.33.44.158:8088: connection reset by peer

node0 (main) listenings:
tcp 0 0 10.33.44.158:8083 0.0.0.0:* LISTEN 10231/influxd
tcp 0 0 10.33.44.158:8088 0.0.0.0:* LISTEN 10231/influxd
tcp 0 0 10.33.44.158:8091 0.0.0.0:* LISTEN 10231/influxd
tcp6 0 0 :::8086 :::* LISTEN 10231/influxd

Tried to delete everything from /var/lib/influxdb/ on both servers, didn't worked.

My node1 /etc/default/influxdb:

INFLUXD_OPTS="-join 10.33.44.158:8088"

Please help me solving it out.

e-dard · 2016-02-26T10:52:13Z

Hi @keksior,

Can you provide more information, including steps to reproduce? How many nodes? How have you configured them? What version of Influx?

keksior · 2016-02-26T11:01:59Z

Ubuntu 14.04.03, influx 0.10.1-1. 2 nodes: node0,node1. I configured them like this:

node0:
/etc/influxdb/influxdb.conf
http://pastebin.com/fLmyDx5y

node1:
/etc/influxdb/influxdb.conf
http://pastebin.com/4SgW3hqz

on the node1 i also added into file /etc/default/influxdb this:
INFLUXD_OPTS="-join 10.33.44.158:8088"

e-dard · 2016-02-26T11:05:46Z

@keksior you need to join to the meta node's HTTP service. Port 8088 is used for internal node communication. Nodes use the HTTP service to discover these ports and hosts.

So, when starting node1 (assuming you already started node0), change the env var to: INFLUXD_OPTS="-join 10.33.44.158:8091".

keksior · 2016-02-26T11:10:03Z

Now it added my node1 to meta_nodes, but not the data_nodes:

data_nodes
id http_addr tcp_addr
1 "localhost:8086" "10.33.44.158:8088"
meta_nodes
id http_addr tcp_addr
1 "10.33.44.158:8091" "10.33.44.158:8088"
2 "10.33.44.167:8091" "10.33.44.167:8088"

e-dard · 2016-02-26T11:12:43Z

Can you provide the logs for both nodes please?

keksior · 2016-02-26T11:17:17Z

node0:
http://pastebin.com/x441UmSC
node1:
http://pastebin.com/vWa06JVN

keksior · 2016-03-02T09:37:59Z

Can someone help me solving it out?

keksior · 2016-03-02T10:18:40Z

I found that setting ":8086" in config make localhost in "show servers" data_nodes. When i changed it to ip address and cleared on node0 /var/lib/influxdb/meta/* it worked everything. Thanks for help.

keksior · 2016-03-02T10:42:36Z

I still have a problem. I've got whole cluster running:

data_nodes
id http_addr tcp_addr
1 "10.33.44.158:8086" "10.33.44.158:8088"
3 "10.33.44.167:8086" "10.33.44.167:8088"
meta_nodes
id http_addr tcp_addr
1 "10.33.44.158:8091" "10.33.44.158:8088"
2 "10.33.44.167:8091" "10.33.44.167:8088"

retention:
name duration replicaN default
default "0" 2 true

But the data is not send to the secondary server. How can i debug this?

e-dard · 2016-03-02T12:06:40Z

Hi @keksior,

Are you saying you're writing to 10.33.44.158:8086 and the data is not propagating to 10.33.44.167:8086? How are you verifying data is not being written to the second node?

keksior · 2016-03-03T08:29:16Z

I checked disk usage:

node0:
root@influx0:~# du -h --max-depth=1 /var/lib/influxdb/data
7.1G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data/_internal
8.5G /var/lib/influxdb/data

node1:
root@influx1:~# du -h --max-depth=1 /var/lib/influxdb/data
408K /var/lib/influxdb/data/_internal
1.4G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data

e-dard · 2016-03-03T08:47:39Z

Can you reproduce this from startup with a fresh cluster?

If not can you provide the influx logs for a period where you know writes
have come in and not been propagated to the other node?

On Thursday, 3 March 2016, keksior notifications@github.com wrote:

I checked disk usage:

node0:
root@influx0:~# du -h --max-depth=1 /var/lib/influxdb/data
7.1G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data/_internal
8.5G /var/lib/influxdb/data

node1:
root@influx1:~# du -h --max-depth=1 /var/lib/influxdb/data
408K /var/lib/influxdb/data/_internal
1.4G /var/lib/influxdb/data/telegraf
1.4G /var/lib/influxdb/data

—
Reply to this email directly or view it on GitHub
#5846 (comment)
.

Edd Robinson
InfluxDB Engineer
▼▴

InfluxData.com http://influxdata.com/

Github http://www.github.com/e-dard / LinkedIn
https://www.linkedin.com/in/eddrobinson / Personal http://eddrobinson.io

keksior · 2016-03-03T08:55:06Z

I have data coming in to my influx every few seconds. The logs for influx0:
[http] 2016/03/03 09:54:08 10.21.1.6 - telegraf [03/Mar/2016:09:54:08 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7da0659c-e11d-11e5-8f99-000000000000 15.159911ms
[http] 2016/03/03 09:54:09 10.21.0.207 - telegraf [03/Mar/2016:09:54:09 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7e45acef-e11d-11e5-8f9a-000000000000 36.631127ms
[http] 2016/03/03 09:54:10 10.21.0.110 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7ed5115d-e11d-11e5-8f9b-000000000000 38.597214ms
[http] 2016/03/03 09:54:10 10.21.1.73 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7edcca3a-e11d-11e5-8f9c-000000000000 8.666437ms
[http] 2016/03/03 09:54:10 10.21.0.164 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7ef6d669-e11d-11e5-8f9d-000000000000 24.970879ms
[http] 2016/03/03 09:54:10 10.21.0.179 - telegraf [03/Mar/2016:09:54:10 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7f126f4e-e11d-11e5-8f9e-000000000000 42.305598ms
[http] 2016/03/03 09:54:11 10.21.0.192 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fb33837-e11d-11e5-8f9f-000000000000 35.105411ms
[http] 2016/03/03 09:54:11 10.21.0.168 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fc7f7e2-e11d-11e5-8fa0-000000000000 24.820672ms
[http] 2016/03/03 09:54:11 10.33.44.171 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fd6cbe2-e11d-11e5-8fa1-000000000000 14.126165ms
[http] 2016/03/03 09:54:12 10.21.1.106 - telegraf [03/Mar/2016:09:54:11 +0100] POST /write?consistency=&db=telegraf&precision=s&rp= HTTP/1.1 204 0 - InfluxDBClient 7fd84234-e11d-11e5-8fa2-000000000000 16.669366ms

And the last logs from the node1:
[tsm1] 2016/03/03 09:52:57 beginning level 3 compaction of group 0, 4 TSM files
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000372-000000003.tsm (#0)
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000376-000000003.tsm (#1)
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000380-000000003.tsm (#2)
[tsm1] 2016/03/03 09:52:57 compacting level 3 group (0) /var/lib/influxdb/data/telegraf/default/3/000000384-000000003.tsm (#3)
[tsm1] 2016/03/03 09:53:28 compacted level 3 group (0) into /var/lib/influxdb/data/telegraf/default/3/000000384-000000004.tsm.tmp (#0)
[tsm1] 2016/03/03 09:53:28 compacted level 3 group 0 of 4 files into 1 files in 30.971793057s
[tsm1] 2016/03/03 09:53:29 beginning full compaction of group 0, 2 TSM files
[tsm1] 2016/03/03 09:53:29 compacting full group (0) /var/lib/influxdb/data/telegraf/default/3/000000368-000000005.tsm (#0)
[tsm1] 2016/03/03 09:53:29 compacting full group (0) /var/lib/influxdb/data/telegraf/default/3/000000384-000000004.tsm (#1)

I have nothing more than tsm1 logs in my node1 log file.

keksior · 2016-03-07T09:42:26Z

I cleared /var/lib/influxdb on both nodes and started from the start:

node0:
a] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:45 +0100] POST /execute HTTP/1.1 200 28 - Go 1.1 package http aaabde20-e448-11e5-80c2-000000000000 66.541714ms
[meta] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:35 +0100] GET /?index=13 HTTP/1.1 200 228 - Go 1.1 package http a4b5c12f-e448-11e5-8083-000000000000 10.072341825s
[meta] 2016/03/07 10:40:45 10.33.44.158 - - [07/Mar/2016:10:40:35 +0100] GET /?index=13 HTTP/1.1 200 228 - Go 1.1 package http a4b5689e-e448-11e5-8082-000000000000 10.080732241s
[meta] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:45 +0100] GET /?index=14 HTTP/1.1 200 228 - Go 1.1 package http aab6e010-e448-11e5-80c3-000000000000 23.442601ms
[meta] 2016/03/07 10:40:45 10.33.44.167 - - [07/Mar/2016:10:40:45 +0100] POST /execute HTTP/1.1 200 28 - Go 1.1 package http aab6e45c-e448-11e5-80c4-000000000000 23.332742ms
[meta] 2016/03/07 10:40:45 10.33.44.158 - - [07/Mar/2016:10:40:45 +0100] GET /?index=14 HTTP/1.1 200 228 - Go 1.1 package http aab7b23a-e448-11e5-80c5-000000000000 18.252287ms
[cluster] 2016/03/07 10:40:45 accept remote connection from 10.33.44.167:59460

node1:
[snapshot] 2016/03/07 10:40:35 Starting snapshot service
[copier] 2016/03/07 10:40:35 Starting copier service
[admin] 2016/03/07 10:40:35 Starting admin service
[admin] 2016/03/07 10:40:35 Listening on HTTP: [::]:8083
[continuous_querier] 2016/03/07 10:40:35 Starting continuous query service
[httpd] 2016/03/07 10:40:35 Starting HTTP service
[httpd] 2016/03/07 10:40:35 Authentication enabled: false
[httpd] 2016/03/07 10:40:35 Listening on HTTP: 10.33.44.167:8086
[retention] 2016/03/07 10:40:35 Starting retention policy enforcement service with check interval of 30m0s
[run] 2016/03/07 10:40:35 Listening for signals
2016/03/07 10:40:45 updated node metaservers with: [10.33.44.158:8091 10.33.44.167:8091]

The problem still exist. Any ideas?

jsternberg · 2016-05-20T03:03:43Z

This is from an old version and we no longer support clustering in the open source version, but in case you're still trying to figure this out, you're probably using the wrong port somewhere in your configuration file. This error message happens when you try to make an HTTP (8086) request against the RPC port (8088). I can't remember which configuration option it was exactly, but that would be a good place to start with.

I'm going to close this as it's an issue we're not going to fix (the clustering code is dramatically different from what 0.10 had), but I hope this helps a bit with trying to figure out your configuration issue. I'm sorry we were not able to get to responding to this issue in a more timely manner.

keksior closed this as completed Mar 2, 2016

keksior reopened this Mar 2, 2016

jsternberg closed this as completed May 20, 2016

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

influxdb cluster: tcp.Mux: handler not registered: 71 #5846

influxdb cluster: tcp.Mux: handler not registered: 71 #5846

keksior commented Feb 26, 2016

e-dard commented Feb 26, 2016

keksior commented Feb 26, 2016

e-dard commented Feb 26, 2016

keksior commented Feb 26, 2016

e-dard commented Feb 26, 2016

keksior commented Feb 26, 2016

keksior commented Mar 2, 2016

keksior commented Mar 2, 2016

keksior commented Mar 2, 2016

e-dard commented Mar 2, 2016

keksior commented Mar 3, 2016

e-dard commented Mar 3, 2016

keksior commented Mar 3, 2016

keksior commented Mar 7, 2016

jsternberg commented May 20, 2016

influxdb cluster: tcp.Mux: handler not registered: 71 #5846

influxdb cluster: tcp.Mux: handler not registered: 71 #5846

Comments

keksior commented Feb 26, 2016

e-dard commented Feb 26, 2016

keksior commented Feb 26, 2016

e-dard commented Feb 26, 2016

keksior commented Feb 26, 2016

e-dard commented Feb 26, 2016

keksior commented Feb 26, 2016

keksior commented Mar 2, 2016

keksior commented Mar 2, 2016

keksior commented Mar 2, 2016

e-dard commented Mar 2, 2016

keksior commented Mar 3, 2016

e-dard commented Mar 3, 2016

keksior commented Mar 3, 2016

keksior commented Mar 7, 2016

jsternberg commented May 20, 2016