Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add sudo_network_unstable_watch #91

Merged
merged 4 commits into from
Jan 23, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions src/SUMMARY.md
Original file line number Diff line number Diff line change
Expand Up @@ -36,6 +36,9 @@
- [sudo_unstable_p2pDiscover](api/sudo_unstable_p2pDiscover.md)
- [sudo_unstable_pendingTransactions](api/sudo_unstable_pendingTransactions.md)
- [sudo_unstable_version](api/sudo_unstable_version.md)
- [sudo_network]()
- [sudo_network_unstable_unwatch](api/sudo_network_unstable_unwatch.md)
- [sudo_network_unstable_watch](api/sudo_network_unstable_watch.md)
- [sudo_sessionKeys]()
- [sudo_sessionKeys_unstable_generate](api/sudo_sessionKeys_unstable_generate.md)
- [transaction](api/transaction.md)
Expand Down
9 changes: 9 additions & 0 deletions src/api/sudo_network_unstable_unwatch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,9 @@
# sudo_network_unstable_unwatch

**Parameters**:

- `subscription`: An opaque string that was returned by `sudo_network_unstable_watch`.

**Return value**: *null*

JSON-RPC client implementations must be aware that, due to the asynchronous nature of JSON-RPC client <-> server communication, they might still receive notifications concerning this subscription, for example because these notifications were already in the process of being sent back by the JSON-RPC server.
129 changes: 129 additions & 0 deletions src/api/sudo_network_unstable_watch.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,129 @@
# sudo_network_unstable_watch

**Parameters**: *none*

**Return value**: String containing an opaque value representing the subscription.

This functions lets the JSON-RPC client track the state of the peer-to-peer networking of the blockchain node associated to the JSON-RPC server.

The subscription can later be stopped by calling `sudo_network_unstable_unwatch`.

When this function is called, a `connectionState` event is generated for each connection that already exists, and a `substreamState` event is generated for each substream that already exists. In other words, the JSON-RPC server must send its current networking state to the JSON-RPC client.
In addition, the JSON-RPC server is encouraged to notify the JSON-RPC client of connections and substreams that have recently been closed.

The JSON-RPC server must accept at least one `sudo_network_unstable_watch` subscriptions per JSON-RPC client. Trying to open more might lead to a JSON-RPC error when calling `sudo_network_unstable_watch`. In other words, as long as a JSON-RPC client only starts one `sudo_network_unstable_watch` subscriptions, it is guaranteed that this return value will never happen.

## Notifications format

This function will later generate one or more notifications in the following format:

```json
{
"jsonrpc": "2.0",
"method": "sudo_networkState_event",
"params": {
"subscription": "...",
"result": ...
}
}
```

Where `subscription` is the value returned by this function, and `result` can be one of:

### connectionState

```json
{
"event": "connectionState",
"connectionId": "...",
"targetPeerId": "...",
"targetMultiaddr": "...",
"status": "connecting" | "open" | "closed",
"direction": "in" | "out",
"when": ...
}
```

A `connectionState` event is generated when a new connection attempt is started, when a connection has finished its handshaking phase, or when a connection is terminated.

`connectionId` is an opaque string representing this specific connection.

`status` indicates the state of the connection: `connecting` if the connection hasn't finished its handshake phase (including the libp2p-specific handshakes), `open` if the connection is fully established and can open substreams, or `closed` if the connection is now dead.

Each `connectionId` must follow one of the follow `status` transitions: `connecting` then `open` then `closed`, or `connecting` then `closed` (if an error happend during the handshake). The JSON-RPC server is not allowed to omit events such as the `connecting` event.

Once a `connectionState` event with `status` equal to `closed` is generated, the `connectionId` is unallocated. Any further usage of the same `connectionId` designates a different connection instead.

If `status` is `closed`, the connection must not have any associated substream still alive. A `substreamEvent` of `status` equal to `closed` must have been generated earlier for each substream that corresponds to this connection.

If `status` is `open` or `closed`, the `targetPeerId` is a string containing the string representation of the PeerId of the remote side of the connection. If `status` is `connecting` and `direction` is `in`, the `targetPeerId` must be omitted. If `status` is `connecting`, the `targetPeerId` contains the string representation of the PeerId that the remote is expected to have, which might end up being different from the actual PeerId.

`targetMultiaddr` is a string containing the string representation of the multiaddress of the remote side of the connection. The value in the `targetMultiaddr` field must always be the same for all the events related to a specific connection. In other words, a the multiaddress of the remote never changes during the lifetime of the connection.

`direction` indicates whether the connection was initiated locally (`out`) or by the remote (`in`). The value in the `direction` field must always be the same for all the events related to a specific connection. In other words, a connection never changes direction during its lifetime.

`when` is an integer containing the UNIX timestamp in milliseconds, in other words the number of milliseconds since the UNIX epoch ignoring leap seconds.

### substreamState

```json
{
"event": "substreamState",
"connectionId": "...",
"substreamId": "...",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Dumb question to make sure I got this right: we want to report all open-streams (ie /ipfs/ping, /ipfs/indentify, /[hash]/kad) together with notification protocols, not just notification-protocols (/[hash]/tx/1 etc)?

For notification-protocols, we could easily generate a substreamID when we establish a connection just before the handshake is done. As for getting all substreams, I believe we'd need to interface substrate somehow with yamux implementation directly, and we'd most probably end-up either:

And since we could change the muxer in the future, for example changing the transport to quic, wouldn't the substreamID feel like an implementation detail? (Since we'd need to find a workaround to expose the substreamID from the transport level in that case) 🤔

Let me know if I got this right, or if we could achieve the same behavior easier 🙏

// cc @tomaka @altonen @jsdw

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you just need a wrapper (a "middleware") of the StreamMuxer trait, no need to touch Yamux itself. Every type of connection implements StreamMuxer and is only ever accessed through this trait by the rest of the stack.

However what I didn't realize is that getting the protocol name is kind of tricky, because it is done by sending messages within the substream.
Substreams themselves don't inherently have protocol names, it's just that in libp2p every single substream starts with a handshake that negotiates its protocol.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Substreams themselves don't inherently have protocol names, it's just that in libp2p every single substream starts with a handshake that negotiates its protocol.

Which also means that this RFC needs to clarify what to do with substreams whose protocol hasn't been negotiated yet.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these meant to be implemented for libp2p protocols in the first place? Substrate may have no knowledge that a substream for Kademlia or PING has been opened so there would have to some tracking implemented in libp2p that would expose this information to Substrate.

I don't see the problem with protocol names though. Once the protocol has been negotiated, the name is known and can be reported. If there was a separate "opening" state, then it would be more challenging.

Copy link
Contributor Author

@tomaka tomaka Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

How are these meant to be implemented for libp2p protocols in the first place? Substrate may have no knowledge that a substream for Kademlia or PING has been opened so there would have to some tracking implemented in libp2p that would expose this information to Substrate.

Substrate entirely wraps around all the libp2p protocols: https://github.com/paritytech/polkadot-sdk/blob/a56fd32e0affbe9cb61abb5ee3ec0a0bd15f5bf0/substrate/client/network/src/behaviour.rs#L45-L47

I think that you can do it by:

In your NetworkBehaviour wrapper, you can modify the ToSwarm type to include events about substreams being opened/closed, and then intercept these events in the main network events handler.

I haven't touched these traits in years now, so I might be missing something, but I think it should work.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that you can do it
Adding a wrapper around the ConnectionHandler trait that intercepts on_connection_event.
Adding a wrapper around the NetworkBehaviour trait so that you can wrap around the ConnectionHandler

From looking at the code, my findings also point in this direction. Thanks!

I started to implement a BehaviorWrapper a few days ago, which looked like this:

pub struct BehaviorWrapper<B: BlockT> {
  behavior: Behavior<B>,
}

impl NetworkBehavior for ... { }

This was the first step to expose the ConnectionID since the SwarmEvents from our libp2p version does not expose it at the moment.

I gave up on going that path because of two reasons:

However, I cannot think of a better way than what Pierre suggested:

pub struct BehaviorWrapper<B: BlockT> {
  behavior: Behavior<B>,
}

/// Must propagate the connection ID for proper identification of
/// connection IDs + streamIDs
pub struct HandlerWrapper {
  connectionID: ...,
  ..
}

impl NetworkBehavior for BehaviorWrapper {
  type ConnectionHandler = HandlerWrapper;

fn handle_established_inbound_connection(connectionId, ..) {
 HandlerWrapper::new(connectionId, is_inbound=true, ...)
}

fn handle_established_outbound_connection(connectionId, ..) {
 HandlerWrapper::new(connectionId, is_inbound=false, ...)
}

...
}


impl ConnectionHandler for HandlerWrapper {
// Might have missed something here 
// types = tbd / maybe other wrappers here

fn on_connection_event(..) {
   // Generate a stream ID
   stream_id = AtomicIndex::increment();
   
   // Propagate up to the swarm
   MetricsEventWrapper::CustomProtocolOpen { connectionId, streamId, protocolName, direction, .. }
}

This feels indeed a bit involved and I'd resume implementation once the substrate-p2p refactor is complete:

Since I expect most of this work will generate conflicts with those PRs

Copy link
Collaborator

@jsdw jsdw Sep 21, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The prefix of the function in this PR is sudo_network, so it's only the sudo_network that aren't implemented.

Ah ok, I see that now; the other groups all have a singe word like chainHead_ or chainSpec_ or archive_, but we have sudo_, sudo_sessionKeys_ and now sudo_network_ as three more groups, which I was confused by since it's a bit inconsistent in my mind.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

it's a bit inconsistent in my mind.

Yeah it is, because ultimately the sudo functions should be renamed.

Copy link

@altonen altonen Sep 22, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@lexnv @jsdw

If the PoC/prototype for this code turns out to add non-trivial complexity to sc-network, I hope you reconsider adding this feature.

The libp2p upgrades are really painful to deal with and I would hope to reduce the surface area of libp2p in sc-network to leave us more time to work on things other than dealing with libp2p and its upgrades. That looks to be at odds with the proposed wrapper implementation and my concern is that if this wrapper adds more code that has to be maintained during each libp2p upgrade, it just leaves us with less time to work on actually improving the Polkadot networking code.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't answer that, but I didn't intend this to be implemented in Substrate quickly.

The reason for this function comes from a very pragmatic problem: most of the time, when smoldot isn't working, it's because of a connectivity issue.

When you're running a Substrate node of some kind, you're either a developer or a node operator. You can read logs, run Prometheus, open GitHub issues, etc.

When you're running smoldot, you're an end user, and you shouldn't be reading logs. The only reasonable way to help end users figure out why smoldot isn't working is put some kind of UI on top of it. This JSON-RPC function is what would power this UI.

"status": "open" | "closed",
"protocolName": "...",
"direction": "in" | "out",
"when": ...
}
```

A `substreamState` event is generated when a new connection attempt is started, when a connection has finished its handshaking phase, or when a connection is terminated.

`connectionId` is an opaque string representing this specific connection. It must always correspond to a connection whose latest `status` is `open`.

`substreamId` is an opaque string representing this specific substream within the connection. Each substream is identified by the `connectionId` + `substreamId` tuple rather than just the `substreamId` alone. The JSON-RPC server is allowed to use the same value of `substreamId` for two different substreams belonging to two different connections.

`status` indicates the state of the substream: `open` if the substream is "alive", or `closed` if the substream is dead. A substream is considered "alive" if the JSON-RPC server allocates resources for this substream, even if the remote isn't aware of said substream.

Each `substreamState` event where `status` equal to `closed` must follow a previous `substreamState` even for that same substream where `status` was `open`. In other words, the JSON-RPC server is not allowed to omit event the `open` event.

Once a `substreamState` event with `status` equal to `closed` is generated, the `substreamId` is unallocated. Any further usage of the same `substreamId` in the context of that `connectionId` designates a different substream instead.

`protocolName` is a string indicating the multistream-select protocol name that was negotiated.

`direction` indicates whether the substream was initiated locally (`out`) or by the remote (`in`). Note that this is not the same thing as the `direction` of the corresponding connection. The value in the `direction` field must always be the same for all the events related to a specific substream. In other words, a substream never changes direction during its lifetime.

`when` is an integer containing the UNIX timestamp in milliseconds, in other words the number of milliseconds since the UNIX epoch ignoring leap seconds.

### missedEvents

```json
{
"event": "missedEvents"
}
```

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it also be beneficial to have an optional even reported for cases of miss-behaving / reputation banning?

Copy link
Contributor Author

@tomaka tomaka Sep 13, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The networking in general has a shit-ton of concepts. Beyond reputations, you could also think of reporting ping times, handshake and request sizes, k-buckets states, grandpa neighbor packets, block announces, etc.

This PR focuses on just open connections and substreams because 1) these concepts are easy to understand 2) we know for sure that Polkadot will never not have connections and substreams.

The `missedEvents` event is generated in order to inform the JSON-RPC client that it has not been informed of the existence of all connections or substreams due to it being too slow to pull JSON-RPC notifications from the JSON-RPC server.

See the `Guarantee of delivery` section for more details.

## Guarantee of delivery

JSON-RPC server implementations must be aware of the fact that JSON-RPC clients might pull notifications from the JSON-RPC server at a slower rate than networking events are generated. If this function is implemented naively, a slow or malicious JSON-RPC client can cause the JSON-RPC server to allocate ever-increasing buffers, which could in turn lead to a DoS attack on the JSON-RPC server.

JSON-RPC server implementations are also allowed to completely omit events about connections and substreams whose existence is unknown to the JSON-RPC client. For example, when a connection gets closed, the JSON-RPC server is allowed to not notify the JSON-RPC client if the client wasn't yet notified of the fact that the new-closed connection existed. When that happens, the JSON-RPC server must send a `missedEvents` event to the JSON-RPC client in the nearby future.

JSON-RPC clients must be aware that they aren't guaranteed to see the list of all connections and all substreams that the peer-to-peer endpoint of the node associated to the JSON-RPC server performs. The JSON-RPC client is only guaranteed to be aware of what is currently happening.

Assuming that the number of total active `sudo_network_unstable_watch` subscriptions on any given JSON-RPC server is bounded, and that the number of total active connections and substreams is bounded, the size of the buffer of notifications to send back to JSON-RPC clients is also bounded.

## About timestamps

The `connectionState` and `substreamState` events contain a `when` field indicating when the event happened.

The JSON-RPC server isn't required to order events by the value in their `when` field. The JSON-RPC is only required to order events so that they don't lose their logical meaning. For example, when two different connections open, the JSON-RPC server can send the two `connectionState` events in any order. When a connection opens then closes, the JSON-RPC server must send a `connectionState` with a `status` equal to `open` before the `connectionState` with a `status` equal to `closed`.

## Possible errors

- A JSON-RPC error with error code `-32100` can be generated if the JSON-RPC client has already opened a `sudo_network_unstable_watch` subscription.