Skip to content
This repository has been archived by the owner on Sep 14, 2023. It is now read-only.

feat: add connection retry to proxyProvider #492

Closed
wants to merge 5 commits into from
Closed

Conversation

kratico
Copy link
Contributor

@kratico kratico commented Jan 3, 2023

Add proxy connection retry.
The retry logic uses https://deno.land/std@0.170.0/async/mod.ts?s=retry

How it works?

  • apply the retry logic to create an opened WebSocket
  • if the WebSocket gets closed, attached listeners will be sent an error
  • the next attempt to get an opened WebSocket will apply the retry logic again

There are 2 topics about retries

  • connection retries
  • call/subscription retry

This PR attempts to solve the 1st bullet by improving the provider.
To solve the call/subscription retry, some knowledge about the inflight message intents (and ids) is needed.
That means that call/subscription retry need to be solved at the client level or above.

Next step

  • implement connection retry for smoldot

Comment on lines +10 to +15
const ids = U.throwIfError(
await C.entryRead(C.polkadot)("Paras", "Parachains", [])
.access("value")
.as<number[]>()
.run(),
)
Copy link
Contributor Author

@kratico kratico Jan 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I re-wrote this because there is an issue with the client reference counting.
After

 const ids = C.entryRead(C.polkadot)("Paras", "Parachains", [])
   .access("value")
   .as<number[]>()

The client reference count reaches 1 and it's discarded in

capi/effects/rpc.ts

Lines 93 to 102 in 7edf6a2

async function discardCheck<CloseErrorData>(
client: rpc.Client<any, any, any, CloseErrorData>,
counter: Z.RcCounter,
) {
counter.i--
if (!counter.i) {
return await client.discard()
}
return
}

The above invokes the proxyProvider.release that removes the WebSocket listener.
As a result the next call C.Z.each(...) successfully creates a new WebSocket but there is not listener for the incoming messages.
Note: The listener is placed by the rpcClient effect.

Comment on lines 9 to 10
const activeWs = new Map<string, WebSocket>()
const connectingWs = new Map<string, Promise<WebSocket>>()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These 2 maps are used for

  • caching an already opened WebSocket
  • deduplicating WebSocket creation while there is one that is connecting (not OPEN)

const listenersContainer = new ListenersContainer<string, Event, Event>()
const activeWs = new Map<string, WebSocket>()
const connectingWs = new Map<string, Promise<WebSocket>>()
const CUSTOM_WS_CLOSE_CODE = 4000
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Status codes in the range 4000-4999 are reserved for private use (see https://www.rfc-editor.org/rfc/rfc6455.html#section-7.4.1)

This code is used to signal a graceful client WebSocket close.

@@ -71,3 +71,54 @@ export function nextIdFactory() {
let i = 0
return () => i++
}

export class ListenersContainer<
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The intent of this class is to replace the ProviderConnection class and use it in the smoldot provider too.
The ProviderConnection class has the inner and cleanUp props, but they are not used by the class methods, so it seems that they don't belong to this class.
After removing these props from ProviderConnection, this class has the responsibility of containing listeners for a single WebSocket connection.
The ListenersContainer is an improvement on the above that can contain listeners for many WebSocket connections.

const activeWs = new Map<string, WebSocket>()
const connectingWs = new Map<string, Promise<WebSocket>>()
const CUSTOM_WS_CLOSE_CODE = 4000

export const proxyProvider: Provider<string, Event, Event, Event> = (url, listener) => {
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I believe that we need to pass retryOptions here.
Perhaps, creating with a proxyProviderFactory

const proxyProviderFactory = (retryOptions) => (url, listener) => {...};
const proxyProvider = proxyProviderFactory(defaultRetryOptions);

Thoughts?

@kratico kratico marked this pull request as ready for review January 4, 2023 19:06
@kratico kratico requested a review from harrysolovay as a code owner January 4, 2023 19:06
Co-authored-by: Harry Solovay <harrysolovay@users.noreply.github.com>
Copy link
Contributor

@harrysolovay harrysolovay left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

With these changes, provider send calls are more likely to successfully go through, bc unexpected provider open failures are caught and their opens are retried. This is great. However, is this the right first step towards a comprehensive recovery story? Does it position us to address other parts of the recovery story?

What else might trigger a retry? Can that retry be triggered by the client handler?

As a thought experiment, what refactoring would be required to not boil in the retry logic to the provider itself.

const p = retryProvider(baseProvider, retryOptions)
const c = new Client(p, discoveryValue)

Note: it is unclear to me whether retry logic should or should not be the default... but this is an important consideration given our thinking that we'll gradually add on recovery-related safeguards in the handler of Client. With this in mind, should things such as openedWs be accessible from Client? What other provider context might we want to access from elsewhere? And when?

@@ -106,6 +137,6 @@ function closeWs(socket: WebSocket): Promise<undefined | ProviderCloseError<Even
controller.abort()
resolve(new ProviderCloseError(e))
}, controller)
socket.close()
socket.close(CUSTOM_WS_CLOSE_CODE, "Client normal closure")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where will devs see the "Client normal closure" message?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

controller.abort()
resolve(e)
}, controller)
function openedWs({ url, activeWs, connectingWs, listener, retryOptions }: OpenedWsProps) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we explicitly type the return type for legibility?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll add it to improve readability

export const proxyProviderFactory = (
{ retryOptions }: ProxyProviderFactoryProps = {},
): Provider<string, Event, Event, Event> => {
const listenersContainer = new ListenersContainer<string, Event, Event>()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this be within the module scope? If not, multiple providers could house websockets of identical discovery values (correct?). On the flip side, the current approach means we don't need to track the number of "users" of a globally accessible ws instance (to decide whether or not it can actually be closed).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It depends on how the factory is used.
Calling the factory once and sharing the created provider works similar to not having a factory and using a module variable to cache stuff.
With the factory approach there is an improvement for concurrent unit tests and fixes some unit tests caching issues.

HandlerErrorData,
> {
#listeners = new Map<
DiscoveryValue,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm curious how this will be generalized for smoldot (perhaps I recall incorrectly: the discovery value must first be retrieved based on the chainspec, right?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DiscoveryValue is not a good name.
This is just a key type for Map so it doesn't always need to be the provider discoveryValue

For smoldot, it's a bit challenging to pick a key for parachains.
To connect to a parachain, we need the potential relay chain spec.
So, a better key could be the concatenation of the relay+parachain specs.
Currently the key is an object which can easily have a different object ref for the same values

{
  chainSpec: {
    relay: string
    para?: string
  }
}

@kratico
Copy link
Contributor Author

kratico commented Jan 9, 2023

With these changes, provider send calls are more likely to successfully go through, bc unexpected provider open failures are caught and their opens are retried. This is great. However, is this the right first step towards a comprehensive recovery story? Does it position us to address other parts of the recovery story?

What else might trigger a retry? Can that retry be triggered by the client handler?

As a thought experiment, what refactoring would be required to not boil in the retry logic to the provider itself.

@harrysolovay I believe that we could retry on Client.call if the message is retry-friendly and the exception is not message specific. For example, ProviderSendError could be triggered when the WS is not ready which is not message specific or when we try to send something that it cannot be JSON.stringifyed which is message specific.
So the exceptions/errors need to be refactored.
Also, on retry, the provider caches should be reseted to not return an already closed WS which is already covered in this PR.

The new signature for call and subscription could be

call: ClientCall<SendErrorData, HandlerErrorData> = (id, method, params, retryOnError)


  subscriptionFactory = <
    Params extends unknown[] = any[],
    NotificationData = any,
  >(): SubscriptionFactory<Params, NotificationData, SendErrorData, HandlerErrorData> =>
  (
    subscribeMethod,
    unsubscribeMethod,
    params,
    createListener,
    retryOnError,
  )

@harrysolovay
Copy link
Contributor

For context: this is on hold until @kratico returns from the academy :)

@tjjfvi tjjfvi marked this pull request as draft February 20, 2023 18:59
@tjjfvi tjjfvi mentioned this pull request Feb 20, 2023
@tjjfvi tjjfvi closed this Feb 20, 2023
@tjjfvi tjjfvi deleted the feat/reconnect-v2 branch March 9, 2023 15:04
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants