Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cluster panics when duplicate node url is encountered #2279

Closed
cboggs opened this issue Apr 14, 2015 · 3 comments
Closed

Cluster panics when duplicate node url is encountered #2279

cboggs opened this issue Apr 14, 2015 · 3 comments
Assignees
Milestone

Comments

@cboggs
Copy link

cboggs commented Apr 14, 2015

Steps to reproduce:

  1. Spin up 3 node cluster with Node1 as leader and Node2/3 joining Node1 on initial startup
  2. Shut down Node1
  3. On Node1, rm -rf /var/opt/influxdb/*
  4. Edit Node1 config to join Node2 and Node3 on next startup
  5. Start Node1
  6. Observe cluster-wide panic
  7. Restart InfluxDB on Node2 and Node3 - these two nodes seem to start correctly as long as Node1 stays offline

Here's what I see in the logs on Node2 and Node3:

[raft] 2015/04/14 15:30:04 apply: add node: duplicate node url
panic: apply: add node: duplicate node url

goroutine 8 [running]:
log.(*Logger).Panicf(0xc20800a730, 0x9ed850, 0x13, 0xc2083f3d58, 0x1, 0x1)
    /root/.gvm/gos/go1.4.2/src/log/log.go:200 +0xd1
github.com/influxdb/influxdb/raft.(*Log).mustApplyAddPeer(0xc20803a820, 0xc208392c90)
    /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/raft/log.go:1518 +0x2ce
github.com/influxdb/influxdb/raft.(*Log).applyNextUnappliedEntry(0xc20803a820, 0xc20800c6c0, 0x0, 0x0)
    /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/raft/log.go:1436 +0x752
github.com/influxdb/influxdb/raft.(*Log).applier(0xc20803a820, 0xc20800c6c0)
    /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/raft/log.go:1382 +0x161
created by github.com/influxdb/influxdb/raft.func·002
    /root/.gvm/pkgsets/go1.4.2/global/src/github.com/influxdb/influxdb/raft/log.go:389 +0x764
@cboggs
Copy link
Author

cboggs commented Apr 14, 2015

Deleting the 'failed' node via the API succeeds, and the dead server no longer shows up in "show servers" queries, but trying to start that node again causes the panic all the same.

However, deleting the server via the API before breaking it (by deleting the data dirs) actually allows the node to rejoin the cluster. Awesome!

Just need to iron out the "panic when a node dies before it is explicitly removed from the cluster" behavior, I think.

@jwilder
Copy link
Contributor

jwilder commented Apr 14, 2015

Related #1471 #1472

@beckettsean beckettsean added this to the 0.9.0 milestone Apr 14, 2015
@jwilder jwilder self-assigned this May 1, 2015
@toddboom toddboom modified the milestones: 0.9.0, 0.9.1 May 8, 2015
@toddboom toddboom modified the milestones: 0.9.1, 0.9.2 Jun 5, 2015
@otoolep
Copy link
Contributor

otoolep commented Jun 9, 2015

No longer applicable in new design.

@otoolep otoolep closed this as completed Jun 9, 2015
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants