Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deadlocks on bootstrapping in distributed settings #109

Open
xinhaoyuan opened this issue Oct 6, 2018 · 6 comments
Open

Deadlocks on bootstrapping in distributed settings #109

xinhaoyuan opened this issue Oct 6, 2018 · 6 comments

Comments

@xinhaoyuan
Copy link

Hi,

I'm testing the concurrency of swarm using my in-house tool (will be released soon!). I've found several deadlocks when bootstrapping swarm in distributed settings. Here is one potential sketch of how a deadlock could happen:

  1. Initially node1, node2 were in cluster_wait state
  2. node1 gets cluster_join, select to sync with node2, and enters syncing state
  3. node2 puts sync of node1 into pending_sync_req
  4. node2 gets cluster_join, select to sync with node1 (sync of node1 is still in pending!), and enter syncing state
  5. node1 gets sync from node2 and discovered a tie, it decides to wait node2 for sync_reply
  6. Because node1's sync to node2 is in pending, it will never get handled (and thus no sync_reply will be sent to node1)
  7. Both nodes are waiting for each other. The syncing process becomes dead.
@xinhaoyuan
Copy link
Author

I see a deadlock in my trivial test with three nodes [1], while it runs fine with two nodes.

Is there anything wrong with my test code?

[1] https://github.com/xinhaoyuan/morpheus-app-test/tree/master/swarm_test_simple

@farhadi
Copy link

farhadi commented Dec 6, 2018

I also have the same issue with 3 nodes.
I use a docker-compose file to start 3 containers at the same time and they all stuck syncing/waiting.

@arjan
Copy link
Collaborator

arjan commented Dec 14, 2018

I think this issue is fixed by #118, please reopen if you still get it on master.

@arjan arjan closed this as completed Dec 14, 2018
@xinhaoyuan
Copy link
Author

It appears the same in my test case (with commit 1430212).

@farhadi
Copy link

farhadi commented Dec 18, 2018

I'm still having the same issue with latest master branch.

@beardedeagle
Copy link
Collaborator

reopening for visibility

@beardedeagle beardedeagle reopened this Dec 20, 2018
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants