Skip to content
This repository has been archived by the owner on Jun 25, 2021. It is now read-only.

Nodes appear to be running as Clients. #757

Closed
ghost opened this issue Oct 27, 2015 · 12 comments
Closed

Nodes appear to be running as Clients. #757

ghost opened this issue Oct 27, 2015 · 12 comments

Comments

@ghost
Copy link

ghost commented Oct 27, 2015

Running multiple instances of key_value_store as a node we're seeing all connecting to the first but not to each other. If we run a client it also connects only to the first, while storing a value is only stored the first.

@ghost ghost self-assigned this Oct 27, 2015
@ghost ghost added bug critical labels Oct 27, 2015
@ghost
Copy link
Author

ghost commented Oct 27, 2015

Update:

We were using SendContent to request a network name for a client, updated to use ClientSendContent. In RoutingCore, lookup_connections matches on state which for the first node erroneously returned None when a Relay connection was expected and was present. Currently responses to messages received by the first node are not being returned, specifically request_network_name. Continuing debug.

@ghost
Copy link
Author

ghost commented Oct 27, 2015

Included re-bootstrapping if disconnected when relocating.

@benjaminbollen
Copy link

point A.
https://github.com/maidsafe/routing/pull/758/files#diff-1ee67c5acfa53a1ba6d9a5bb56540ba2R240

we can't add these "unknown bootstrap" connections to the "unknown connections" without fully reviewing the normal diagrams from RFC-0011 because unknown_connections there are from on_accept, not from on_connect.

@benjaminbollen
Copy link

point B.

As @brian-js points out, crust actually connects on multiple/all endpoints provided in the list of endpoints when calling connect. This is different from the expected behavior. Additionally the connection management assumes there are no duplications of connections for a given unique copy of ConnectRequest/ConnectResponse. To assert this, routing will depublicate the ConnectRequest with a unique endpoint, not a vector of endpoints in a unique ConnectRequest

@brian-js will look into this more

@benjaminbollen
Copy link

code : 8f24e1a
crust : crate 0.4.2
with only one node running (starts listening on 5483) it receives a connection from 55474 on the local loopback. Where is this coming from?

$ RUST_LOG=routing=debug ./target/release/examples/key_value_store --node
INFO:routing::routing_node: RoutingNode Client(e61f88..ed1119) listens on [Tcp(V4(127.0.0.1:5483)), Tcp(V4(192.168.0.3:5483))]
DEBUG:routing::routing: Started routing run().
DEBUG:routing::routing_node: RoutingNode started running and started bootstrap
DEBUG:routing::routing_node: handle_on_connect: Disconnected adding unknown bootstrap Connection(Tcp V4(127.0.0.1:55474) -> V4(127.0.0.1:5483))
DEBUG:routing::routing_core: add_unknown_bootstrap_connection: added Connection(Tcp V4(127.0.0.1:55474) -> V4(127.0.0.1:5483))
DEBUG:routing::routing_core: add_unknown_bootstrap_connection: set state Bootstrapped
DEBUG:routing::routing_node: Saying hello I am Client(e61f88..ed1119) on Connection(Tcp V4(127.0.0.1:55474) -> V4(127.0.0.1:5483)), confirming None
DEBUG:routing::routing_node: handle_on_accept: Bootstrapped so not accepting Connection(Tcp V4(127.0.0.1:5483) -> V4(127.0.0.1:55474)). Dropping
DEBUG:routing::routing_node: Hello, it is Client(e61f88..ed1119) on Connection(Tcp V4(127.0.0.1:5483) -> V4(127.0.0.1:55474))
ERROR:routing::routing_core: Failed to add client Client(e61f88..ed1119) as relay connection on Connection(Tcp V4(127.0.0.1:5483) -> V4(127.0.0.1:55474)). Dropping.
DEBUG:routing::routing_node: Lost connection on Connection(Tcp V4(127.0.0.1:55474) -> V4(127.0.0.1:5483))

@benjaminbollen
Copy link

code: 8f24e1a
crust : crate 0.4.2
Alternatively it can also occur that the on_accept is processed first, then the following occurs (but similarly we are establishing connections to ourselves on the local loopback from crust bootstrapping

INFO:routing::routing_node: RoutingNode Client(fa1db7..616bb8) listens on [Tcp(V4(127.0.0.1:5483)), Tcp(V4(192.168.0.3:5483))]
DEBUG:routing::routing: Started routing run().
DEBUG:routing::routing_node: RoutingNode started running and started bootstrap
DEBUG:routing::routing_node: handle_on_accept: Disconnected so self-assigning name e8ae93..f5ce2a
DEBUG:routing::routing_core: Assigning name e8ae93..f5ce2a while disconnected.
DEBUG:routing::routing_node: handle_on_accept: Relocated adding unknown connection Connection(Tcp V4(127.0.0.1:5483) -> V4(127.0.0.1:55610)) on accept.
DEBUG:routing::routing_node: handle_on_accept: Relocated matching Connection(Tcp V4(127.0.0.1:55610) -> V4(127.0.0.1:5483)) against expected connections

@ghost
Copy link
Author

ghost commented Oct 30, 2015

Looking at resolution for an issue regarding to the first two nodes:
After relocating the Client sends a ConnectRequest as a ManagedNode. On receipt the Bootstrap node fails to send back a ConnectResponse since it has no connections other than a Client in it's relay map.

@ghost
Copy link
Author

ghost commented Nov 2, 2015

The primary connection of the unknown connection when matching expected to unknown connections was getting dropped. We can now establish a connection over native and routing network as specified in the connection management RFC between the first two nodes, however, there remains issues when a third, or more, nodes connect. Stopping an instance of a running example shows that connections are present between the nodes but they never get consolidated when more than two instances are running. A client is able to connect to the first two nodes and store/retrieve data from them.

@benjaminbollen
Copy link

One bug discovered and easily patcheable: ::crust::Event::BootstrapFinished can come before the ::crust::Event::OnConnect from the bootstrap. This induced an error that (seemingly machine dependent) on BootstrapFinished queried the routing State which could still be Disconnected and as such trigger a RoutingCore::Reset falsely

@Fraser999
Copy link
Contributor

Update:

Spandan and myself have started looking at this issue. Brian has made a PR which appears to partially fix this when running several instances of key_value _store on the same machine. However, even then, several nodes fail to properly populate their routing tables.

We've also tried running this via the droplet_deployer to create a network of 3 nodes. This apparently showed no nodes connecting to eachother.

@ghost ghost closed this as completed in 0da0288 Nov 12, 2015
@Viv-Rajkumar
Copy link
Contributor

Reopening issue as the merged PR does not address all issues as detailed by @Fraser999

@Viv-Rajkumar Viv-Rajkumar reopened this Nov 12, 2015
@ghost
Copy link
Author

ghost commented Nov 12, 2015

The issue I mentioned yesterday, that Fraser highlights above, was recorded in #746 when it was last brought up. This issue, 757, was more specifically dealing with nodes failing to connect to any other node after the connection management refactor.

@dirvine dirvine closed this as completed Jan 9, 2016
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

4 participants