Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlock during dynamic cluster bringup #60

Closed
evadne opened this issue Jan 3, 2018 · 2 comments
Closed

Deadlock during dynamic cluster bringup #60

evadne opened this issue Jan 3, 2018 · 2 comments
Labels

Comments

@evadne
Copy link

evadne commented Jan 3, 2018

  • Erlang/OTP 20
  • Elixir 1.5.1
  • Swarm 3.1.0

I may have observed a deadlock with simultaneous bringup of 3 nodes. In this case all 3 trackers are stuck in :syncing or :cluster_wait states, and they do not recover.

16:48:35.537 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:init] started
16:48:35.978 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:ensure_swarm_started_on_remote_node] nodeup WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185
16:48:36.186 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:ensure_swarm_started_on_remote_node] nodeup WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184
16:48:40.538 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:cluster_wait] joining cluster..
16:48:40.538 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:cluster_wait] found connected nodes: [:"WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184", :"WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185"]
16:48:40.538 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:cluster_wait] selected sync node: WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184
16:48:40.636 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:syncing] pending sync request from WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185
16:51:14.556 [info] [swarm on WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133] [tracker:syncing] pending sync request from WEB-Fp9bgGanVj0Pqx6Mi5pAxRGVDd9m5naD@10.3.2.163
16:48:35.633 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:init] started
16:48:35.978 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:ensure_swarm_started_on_remote_node] nodeup WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133
16:48:36.192 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:ensure_swarm_started_on_remote_node] nodeup WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184
16:48:40.634 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:cluster_wait] joining cluster..
16:48:40.635 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:cluster_wait] found connected nodes: [:"WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184", :"WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133"]
16:48:40.635 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:cluster_wait] selected sync node: WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133
16:48:40.904 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:syncing] pending sync request from WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184
16:49:38.217 [info] [swarm on WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185] [tracker:syncing] pending sync request from WEB-H02yNEFvmwFL7YR0gcfHB56vxnNTAM5J@10.3.1.161
16:48:35.899 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:init] started
16:48:36.182 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:ensure_swarm_started_on_remote_node] nodeup WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133
16:48:36.189 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:ensure_swarm_started_on_remote_node] nodeup WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185
16:48:40.535 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:cluster_wait] pending sync request from WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133
16:48:40.900 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:cluster_wait] joining cluster..
16:48:40.901 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:cluster_wait] found connected nodes: [:"WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185", :"WORKER-pJhs9M1CeXdb90RyNesmoXGWcOnn8x6H@10.3.2.133"]
16:48:40.901 [info] [swarm on WORKER-UdtCnuHD6j2mFcO50jcU7Bz7bQMNJNkR@10.3.1.184] [tracker:cluster_wait] selected sync node: WORKER-hfka62iSx0FiU71200WwdytbsGM2BpXB@10.3.2.185
@bitwalker
Copy link
Owner

This should be addressed in the latest release, and there are some additional fixes/improvements in master.

@bitwalker bitwalker added the bug label Feb 7, 2018
@dergraf
Copy link

dergraf commented Mar 1, 2018

@evadne did the newest version solved your issue... I am actually observing the same behaviour with 3.3.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants