-
Notifications
You must be signed in to change notification settings - Fork 4
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Reduce the timeout for establishing a Node Connection within the Discovery domain (by adding timer override to NodeConnectionManager
)
#353
Comments
NodeConnectionManager
)
Was this done? The |
The discovery domain doesn't seem to have been touched in #378 so I don't think so |
This is a more general problem since the discovery system doesn't set timeouts for connection to nodes and identity providers. Identity providers uses To solve this in a general sense we need to make use of the Places that require integration with
In the GRPC client stubs this is an example of the call signature. Take notes of the public nodesClosestLocalNodesGet(request: polykey_v1_nodes_nodes_pb.Node, metadata: grpc.Metadata, options: Partial<grpc.CallOptions>, callback: (error: grpc.ServiceError | null, response: polykey_v1_nodes_nodes_pb.NodeTable) => void): grpc.ClientUnaryCall; The server should also have a default deadline set for if the client doesn't pass a deadline to the server. This is to prevent a DDOS attack or the case of slow clients holding up the server. According to the GRPC docs the client can cancel at any time. but the server can just close the stream or end with a response. This will likely be a server deadline exceeded exception that the server throws. The longest deadline here will be the server's deadline. If the client deadline is reached then the client throws it's deadline error and cancels the connection. It is possible that both sides throws the exception when the deadline is exceeded. |
1-2 seconds is too strict, especially for contacting identities. It's also not about what the deadline is for establishing connection but deadline for each individual call as well. It's not that the discovery is serialised, We need to make this concurrent so we can have each timeout be long running but do multiple at a time. So
With reference to #328 the decision for which nodes and identities to visit should be proximity to our own gestalt, priority and trust. Where trust is the ACL property. |
Let's make #445 stage the changes of integration task manager into discovery. But to actually to solve this issue, we will need to incorporate deadlines to the RPC communication, and also timers and |
The main thing to understand here is to realise that Discovery is doing something similar to NodeGraph, only that the GestaltGraph is the database. |
This issue is a pretty old one and the way we solve this has changed by the usage of The core changes for this is handled by #477. but that is a generalised update to the discovery domain. This will be complete once #477 is complete and we set the shorter timeouts for connections during discovery. |
|
Specification
The default timeout for node connections is currently set inside the
NodeConnectionManager
for 20s. This can only be set when creating a new Polykey agent by setting thenodeConnectionManagerConfig
when callingPolykeyAgent.createPolykeyAgent()
:This value is subsequently used for all node connections, regardless of what the node connection is used for or where it originates from. This is a problem for the Discovery domain, where a default timeout of 20s is too long. The Discovery domain calls
NodeManager.requestChainData()
, which callsNodeConnectionManager.withConnF()
, which subsequently creates the node connection that we use. We need to be able to change the timeout for creating this node connection from inside the Discovery domain to something smaller, for example 1-2s. This does not need to be a modifiable option that is available to the user, since it should not affect the behaviour of the Discovery domain, so the timeout can just be a constant.In order for the Discovery domain to be able to set its own timeout for Node Connections, we need to propagate the timeout through a few different places in the code. Specifically:
Discovery
classNodeManager.requestChainData()
needs to have an extra parameter for this timeout (alternatively, since thisrequestChainData()
method is not used anywhere else outside ofDiscovery
we could move it into theDiscovery
class and methods called by it could just directly use theDiscovery
's timeout property)NodeConnectionManager.withConnF()
(called byrequestChainData()
) needs an additionaltimeout
parameterNodeConnectionManager.acquireConnection()
(called bywithConnF()
) needs an additionaltimeout
parameterNodeConnectionManager.createConnection()
(called byacquireConnection()
) needs an additionaltimeout
parameterNodeConnectionManager.establishNodeConnection()
(called bycreateConnection()
) needs an additionaltimeout
parameter which defaults toNodeConnectionManager.connConnectTime
if not set explicitly - all instances of usingNodeConnectionManager.connConnectTime
inside this method need to switch over to using this parameter insteadAdditional context
Tasks
timeout
property for theDiscovery
class (around 1000-2000ms)The text was updated successfully, but these errors were encountered: