Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Master service fails when orchestrator is down #203

Closed
juexun opened this issue Jan 15, 2019 · 3 comments · Fixed by #219
Closed

Master service fails when orchestrator is down #203

juexun opened this issue Jan 15, 2019 · 3 comments · Fixed by #219
Assignees
Labels
Milestone

Comments

@juexun
Copy link

juexun commented Jan 15, 2019

The rebooted node failed with the following message:
New IP of rebooted node: 10.233.5.244
Old IP before rebooted: 10.233.5.243
Other nodes without rebooted: 10.233.0.160 and 10.233.17.75

  • Rebooted node side
2019/01/15 04:52:56 [INFO] raft: Node at 10.233.5.244:10008 [Follower] entering Follower state (Leader: "")
2019/01/15 04:52:58 [WARN] raft: Heartbeat timeout from "" reached, starting election
2019/01/15 04:52:58 [INFO] raft: Node at 10.233.5.244:10008 [Candidate] entering Candidate state
2019/01/15 04:52:58 [WARN] raft: Remote peer 10.233.0.160:10008 does not have local node 10.233.5.244:10008 as a peer
2019/01/15 04:52:58 [WARN] raft: Remote peer 10.233.17.75:10008 does not have local node 10.233.5.244:10008 as a peer
  • existed nodes side
019/01/15 04:55:55 [DEBUG] raft: Failed to contact 10.233.5.243:10008 in 3m59.07956023s
2019/01/15 04:55:56 [DEBUG] raft: Failed to contact 10.233.5.243:10008 in 3m59.528575654s
2019/01/15 04:55:56 [WARN] raft: Rejecting vote request from 10.233.5.244:10008 since we have a leader: 10.233.17.75:10008

The rebooted node failed to join the existed cluster because

  • the ip of rebooted node had changed
  • other nodes keep the old ip of rebooted node
@AMecea
Copy link
Contributor

AMecea commented Jan 15, 2019

Hi @juexun, This issue is duplicate of #107. The issue is because orchestrator does not refresh the IPs, as you observed. What I recommend is to use only one node for orchestrator until #107 is fixed.

@juexun
Copy link
Author

juexun commented Jan 16, 2019

@AMecea , Thanks.

@juexun
Copy link
Author

juexun commented Jan 16, 2019

If this issue can not fixed, the orchestrator will be the SOPF of the cluster. The master service of mysql is unreachable if orchestrator died

@calind calind added this to the 0.2.4 milestone Jan 21, 2019
@calind calind added the bug label Jan 21, 2019
@calind calind changed the title Orchestrator: nodes failed to join cluster after it had been rebooted. Master service fails when orchestrator is down Jan 28, 2019
chapsuk pushed a commit to chapsuk/mysql-operator that referenced this issue Oct 16, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants