Serialize the peerset state on disk and reload it at startup #565

tomaka · 2020-01-28T16:29:45Z

We used to store a nodes.json file on disk containing the known state of the network. This feature got removed when we introduced the peerset manager, but it has always been intended that it should be restored.

It is fundamentally better to not be based too much on bootnodes, and instead try to restore connections to previously working nodes.

This issue is about restoring this feature.

The text was updated successfully, but these errors were encountered:

tomaka · 2020-02-17T12:55:50Z

In the past, we used to use a network-specific config_path to store the JSON file.
Maybe we could put that in the database instead, in order to reduce the number of files that Substrate/Polkadot maintains? I'm not sure whether that's a good idea.

stale · 2021-07-07T18:13:00Z

Hey, is anyone still working on this? Due to the inactivity this issue has been automatically marked as stale. It will be closed if no further activity occurs. Thank you for your contributions.

tomaka · 2021-07-08T07:22:02Z

Issue still relevant and important.

RGafiyatullin · 2022-07-25T16:03:07Z

I had a look into the sc-network and sc-peerset cratres.

I would like to proceed with the following solution.

Add another CLI argument --persist-peers.
If set the node shall periodically save the information about the currently connected (and recently connected) peers, and upon startup — will prefer the persisted peers over the provided --bootnodes <NODES> .

In order to implement such behaviour, the modifications of the sc-peerset::Peerset/PeersetHandle and the sc-network::DiscoveryBehaviour would be required. There will be two pieces of persistent data:

the PSM data;
the Peers' addresses.

Keep track of the `PeerId`s of the connected nodes.

The Peerset (sc-peerset) is responsible for initiating the connection to another peer: it allocates the available slots among the known peers with the preference to the reserved nodes.

The state of Peerset can be saved periodically, and restored upon node's startup.

The only thing to do in sc-peerset is to add an ability to dump the current PSM's state.

The persistence-routine is to be done within the sc-network (probably sc-network::DiscoveryBehaviour).

Keep a "last-resort" cache of `Multiaddr`s.

The Peerset though operates in terms of PeerIds, which is not sufficient to rejoin the previosly connected peers, there is also a need to persist the Multiaddrs of the connected peers.
Those can be saved periodically in an LRU fasion by DiscoveryBehaviour.

The "last-resort" thing: normally the peers would be resolved as they are now, but if <DiscoveryBehaviour as NetworkBehaviour>::addresses_of_peer(...) fails to resolve a peer-id into an address — this cache is used, hence the "last-resort".

Something I am not sure how to implement

If the node has only inbound connections — it does not know any addresses worth saving.

Of course if that node restarts, chances are that the previously connected peers will "drag" this node back into the network using the previously known endpoint. But this isn't terribly robust: if the node changes address between the restarts, or goes offline for the time sufficient to be "forgotten" — it won't be able to rejoin the network.

Probably the solution to this problem would be to eagerly look for the connectable-addresses to keep the Multiaddrs cache populated at all times.

bkchr · 2022-07-25T16:11:08Z

Add another CLI argument --persist-peers.
If set the node shall periodically save the information about the currently connected (and recently connected) peers, and upon startup — will prefer the persisted peers over the provided --bootnodes <NODES> .

Why do we need a CLI option for this?

If the node has only inbound connections — it does not know any addresses worth saving.

How should you only have inbound connections? If you only have these, you would have some problems already. You would for example running into a eclipse attack. You can just ignore this setting.

tomaka · 2022-07-25T16:12:42Z

The debug module contains the identity system, which asks the peers we are connected to what their addresses are.

koute · 2022-07-26T13:06:48Z

I second what @bkchr said, it'd make more sense to me to just have this behavior be the default instead of requiring the user to pass a CLI argument to enable it. (On the other hand adding an option to disable it might make some sense, e.g. for debugging purposes or something.)

bkchr · 2022-08-25T13:02:55Z

To keep track of what @RGafiyatullin and me discussed in dm:

I think that for we should persist all peers. With some kind of Peer { peer_id, addr, last_seen } while last_seen is just the unix timestamp we were connected to this node. We could use this to prune all nodes that we weren't connected for more than one week or something. Then I would just load the list on startup and put like half of the nodes with the most recent last_seen and then some random sampling of the other nodes. These nodes could be passed in as boot nodes to the sync peer set.

What I don't know and what may could also work, can we not just announce all available peers to the peer set and it starts connecting to some of them? I mean when we currently connect to a boot node it will share known nodes with us (I don't know how it works for sure) and then we start connecting to these nodes as well. Could we not hook in there directly? So instead of getting these peers from a different node, we insert the known peers?

One other random question, should we may not persist inbound peers? Otherwise this could may lead to some eclipse attack? As after the next restart these nodes would be outbound peers from our POV?

Would be nice to get some feedback from you @tomaka.

nazar-pc · 2022-08-25T13:07:55Z

FWIW we have implemented simple networking persistence in Subspace and decided that first failure time is the best thing to store, so we can kick non-responsive peers eventually.

tomaka · 2022-08-25T14:43:05Z

@bkchr It's actually insanely complicated. The whole story about how we store peers, addresses, how long (if you store them forever, you've got a memory leak), which addresses we try, when do we stop trying addresses, etc. is a big hack, because in 3 years of networking I've never figured out an algorithm to do this properly.

bkchr · 2022-08-25T14:47:52Z

@bkchr It's actually insanely complicated. The whole story about how we store peers, addresses, how long (if you store them forever, you've got a memory leak), which addresses we try, when do we stop trying addresses, etc. is a big hack, because in 3 years of networking I've never figured out an algorithm to do this properly.

Yeah, but for now I'm more interested in have an idea what would be the best place to hook this in. What kind of pruning strategy etc could be figured out over time.

* Fresh runtime api instance per call estimation * rustfmt skip * fmt

* High level docs - start. * Clean up README * Start adding details to high level docs * More docs on the header sync pallet * Testing scenarios document. * Add some scenarios. * Add multi-sig scenario. * Start writing about message dispatch pallet * Move content from old README into PoA specific doc * Apply suggestions from code review Co-authored-by: Andreas Doerr <adoerr@users.noreply.github.com> * GRANDPA for consistency. * Describe scenario steps. * WiP * Add notes about block production and forks * Update. * Add sequence diagram for Millau to Rialto transfer * Clean up header sync pallet overview * Remove leftover example code * Clean up testing scenarios and amend sequence diagram. * Linking docs. * Add some more docs. * Do a bit of cleanup on the high-level docs * Clean up the testing scenario * Fix typos in flow charts * Fix small typo * Fix indentation of Rust block * Another attempt at rendering block correctly * TIL about lazy list numbering in Markdown * Add list numbers across sections * Start counting from correct number * Update README to use correct path to local scripts * Wrap ASCII art in code block Co-authored-by: Tomasz Drwięga <tomasz@parity.io> Co-authored-by: Tomasz Drwięga <tomusdrw@users.noreply.github.com> Co-authored-by: Andreas Doerr <adoerr@users.noreply.github.com>

tomaka added the J0-enhancement label Jan 28, 2020

tomaka added the Z2-medium label May 3, 2021

stale bot added the A5-stale label Jul 7, 2021

stale bot removed the A5-stale label Jul 8, 2021

RGafiyatullin self-assigned this Jul 15, 2022

bkchr mentioned this issue Jul 17, 2022

Node must remember peers from previous runs paritytech/substrate#11853

Closed

RGafiyatullin referenced this issue in RGafiyatullin/substrate Aug 16, 2022

#4750: a test

76ad1ea

RGafiyatullin referenced this issue in RGafiyatullin/substrate Aug 16, 2022

#4750 test: description

ce291db

RGafiyatullin mentioned this issue Aug 17, 2022

Persist peers between the node restarts paritytech/substrate#12051

Closed

the-right-joyce unassigned RGafiyatullin Nov 7, 2022

altonen added the U3-nice_to_have label Dec 14, 2022

dmitry-markin mentioned this issue Aug 24, 2023

Refactoring: move peer addressed from Discovery to Peerset data store #518

Open

altonen transferred this issue from paritytech/substrate Aug 24, 2023

the-right-joyce added I5-enhancement An additional feature request. D1-medium Can be fixed by a coder with good Rust knowledge but little knowledge of the codebase. and removed J0-enhancement labels Aug 25, 2023

claravanstaden pushed a commit to Snowfork/polkadot-sdk that referenced this issue Dec 8, 2023

update relayer bindingsg (paritytech#565)

a570483

helin6 pushed a commit to boolnetwork/polkadot-sdk that referenced this issue Feb 5, 2024

Fresh runtime api instance per call estimation (paritytech#565)

bc07f7b

* Fresh runtime api instance per call estimation * rustfmt skip * fmt

This was referenced Jun 5, 2024

Update polkadot-sdk from v1.7.0 to v1.11.0 moondance-labs/tanssi#573

Closed

Update polkadot-sdk from v1.10.0 to v1.11.0 moondance-labs/tanssi#577

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Serialize the peerset state on disk and reload it at startup #565

Serialize the peerset state on disk and reload it at startup #565

tomaka commented Jan 28, 2020

tomaka commented Feb 17, 2020

stale bot commented Jul 7, 2021

tomaka commented Jul 8, 2021

RGafiyatullin commented Jul 25, 2022 •

edited

Loading

bkchr commented Jul 25, 2022

tomaka commented Jul 25, 2022

koute commented Jul 26, 2022

bkchr commented Aug 25, 2022

nazar-pc commented Aug 25, 2022

tomaka commented Aug 25, 2022

bkchr commented Aug 25, 2022

Serialize the peerset state on disk and reload it at startup #565

Serialize the peerset state on disk and reload it at startup #565

Comments

tomaka commented Jan 28, 2020

tomaka commented Feb 17, 2020

stale bot commented Jul 7, 2021

tomaka commented Jul 8, 2021

RGafiyatullin commented Jul 25, 2022 • edited Loading

Keep track of the PeerIds of the connected nodes.

Keep a "last-resort" cache of Multiaddrs.

Something I am not sure how to implement

bkchr commented Jul 25, 2022

tomaka commented Jul 25, 2022

koute commented Jul 26, 2022

bkchr commented Aug 25, 2022

nazar-pc commented Aug 25, 2022

tomaka commented Aug 25, 2022

bkchr commented Aug 25, 2022

RGafiyatullin commented Jul 25, 2022 •

edited

Loading

Keep track of the `PeerId`s of the connected nodes.

Keep a "last-resort" cache of `Multiaddr`s.