-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Ensure that exceptions during Discovery are correctly handled #348
Labels
Comments
This was referenced Feb 25, 2022
This issue was too vague and has subsequently been split into two separate issues:
Comments have also been added to the descriptions of #297 and #304 with respect to how those issues relate to this one. Closing this issue now. |
CMCDragonkai
added
the
r&d:polykey:core activity 3
Peer to Peer Federated Hierarchy
label
Jul 24, 2022
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Labels
Specification
When stopping the
Discovery
domain, you need to await for the current taskT1
to finish (i.e. one iteration of the discovery queue, where we discover a node/identity and its linked nodes/identities).This is because we don't have the ability to abort currently asynchronous side-effectful tasks which is scheduled in #297. The task itself involves establishing a node connection to the remote agentN 2
, however, an edge case that we have not fully considered is one whereN 2
has shutdown and is no longer running. In such a situation, the connection timeout which is passed fromNodeConnectionManager
toNodeConnection
toGRPCClientAgent
toGRPCClient
is what is going to determine how long to wait for connection readiness (and thus how long until we can catch an error and exit the discovery process). This timeout is set to 20s forNodeConnectionManager
, which is propagated to all connection timeouts.In instances of this behaviour, you'll see retried attempts to connect through the proxy. Then the
ErrorGRPCClientTimeout
should be thrown, which is then rethrown asErrorNodeConnectionTimeout
. You should get this exception onwithConnF
, which is used byrequestChainData
inNodeManager
, which is called byDiscovery
.We need to ensure that this is indeed the sequence of events in practice, and we need to ensure that errors are correctly caught and logged out.
Additional context
Tasks
Discovery
, the default timeout shouldn't be 20s, that's too long. ThewithConnF
method should be able to override the default timeout set inNodeConnectionManager
, for example by providing a value as a parameter.T1
when we stop the discovery instead of waiting for it to finish. In this case ifT1
finishes even after stopping, ensure thatT1
is removed from the DB, so you don't redo the work.The text was updated successfully, but these errors were encountered: