Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bootstrap connection hangs #31

Closed
aleks-f opened this issue Sep 30, 2021 · 1 comment
Closed

Bootstrap connection hangs #31

aleks-f opened this issue Sep 30, 2021 · 1 comment
Assignees
Labels
bug Something isn't working

Comments

@aleks-f
Copy link
Owner

aleks-f commented Sep 30, 2021

Windows:
I can connect many peers to the bootstrap, no problem (tried 10 all worked). However, if I connect and disconnect the same peer I can only do it twice.( so the same ports..). The 3 rd time it does not properly connect it gets somehow locked when kademlia thread is started. After “killing” the peer (which hangs), no other connection is possible there is an exception thrown if you try to connect ANY peer on any ports. Restarting the bootstrap fixes the problem, and it is consistent (enough to use only one peer, connect to bootstrap, close it, start again, works second time, hangs the third time.

With Poco Kademlia, it happens always, you can connect and disconnect “same peer” (same ports) to bootstrap twice, the third time it fails (the peer practically hangs when it tries to open the session). Restarting the bootstrap will reset the counter but again you can do it twice .However, there is no problem connecting different peers (different ports) one after another, it’s the disconnect/reconnect on the same ports which creates the problem.

Linux:
Bootstrap connections don't seem to work at all on Ubuntu for one peer on a local network and work up to a few times on Windows (2 times before failing after, as Adrian mentioned). We can see that packets are being exchanged but they are not received appropriately by the peer. This has to do with the peer discovery process after initial connection with the bootstrap. We can work on getting these messages passed more efficiently (reducing the total number of packets sent for discovery/bootstrapping), but I wonder if this is an issue with the Proactor's polling mechanism.

For the hanging bug, just to recap, it happens using the “example” programs also, so nothing to do with our own software. Depending on what system you run you may be able to connect 2-3 times but also you may even fail the first time. I suppose if you run in a VM your chance to fail even the first time increases, may have to do with the system running slower, not the OS itself. I see in tests ( enabling logging for Engine and Session) that the initialization messages (hundreds) almost double after each disconnect/reconnect which may explain why even on “fast” systems (Windows runs on bare metal, not VM like we usually run Linux) it will happen after a few connections. In Linux (including WSL) fails the first time. Probably some time hazard when sending/receiving the initialization messages.

@aleks-f aleks-f added the bug Something isn't working label Sep 30, 2021
@aleks-f aleks-f self-assigned this Sep 30, 2021
@aleks-f
Copy link
Owner Author

aleks-f commented Oct 19, 2021

possibly related to the routing table maintenance (see DavidKeller/kademlia#10)
may also be timeouts issue (ms or us?)

aleks-f added a commit that referenced this issue Nov 23, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant