peerstore: next steps for datastore-backed peerstore #1702

raulk · 2018-09-14T18:15:37Z

Following libp2p/go-libp2p-peerstore#34, we're in a pretty good state vs. baseline. Performance benchcmp here: libp2p/go-libp2p-peerstore#31 (comment)

Summing up my thoughts here on how to move forward.

We should find a way to reduce the cost of Peers(). IPFS doesn't call it from critical paths (only CLI commands), but as libp2p powers more projects, we should not make assumptions about how users will be calling us.
- With the current db layout, PeersWithAddrs() traverses the entire KV store building a set of unique peers.
- We obviously want to avoid doing that. Options:
  - Jamming lists of multiaddrs in peer-addresses entries, to reduce total key count.
    - Not feasible because TTL management needs to be fine-grained (up to the multiaddr).
  - Controlling the iterator seek behaviour. The go-datastore iterator doesn't allow us to seek arbitrarily, but badger does. Instead of ingesting all keys, and comparing them, we want to perform a "fold" operation taking advantage of the fact that badger is an ordered LSM tree. When we encounter a new peer, we skip over all keys until ValidForPrefix is no longer true. That means we've encountered another peer, and we use that as the new prefix to seek against.
    - Unfortunately, this only works for ordered key-value stores.
  - Store an index of peers, e.g. in a /peeridx keyspace. Then we can just iterate through a small keyspace instead of the entire database.
    - This is my fave, but given that TTL expires keys at the DB level without us coming to know, maintaining those indices aligned with the true existence of keys for peers is non trivial. I have several ideas here.
Regarding option 3 above, find a way to store the earliest and tardiest expirations.
- The tardiest coincides with when a peer is gone from the DB (if an explicit delete doesn't occur earlier, e.g. setTTL to 0), i.e. it can help us manage the peer index.
- The earliest coincides with when the cache entry for a peer should be invalidated. In this way, we don't need to calculate this timestamp every time we load a peer from DB to the cache.
- We could use merge operators for this, but they are too specific to certain implementations, and also eventually consistent. I guess we need strict consistency.
- Tracking these time bounds is easily implemented if it weren't for explicit deletes:
  - On every insert or update you can update the bounds if the newly added entry changes them.
  - Deletes become non-deterministic. If the deleted entries are defining the bounds (i.e. you are deleting the earliest entry or the tardiest entry to be expired), you need to iterate through all peer's keys to recalculate those bounds.
  - I'd argue it's a worthy tradeoff, though, as it saves time on get, which is probably way more frequent than deletes.
Use the ds.Key functionality to create key hierarchies, e.g. /peer:<id>/addr:<hash or index> => multiaddr.
Migrate the KeyBook and the PeerMetadata substores, under appropriate hierarchical namespaces as per above.
Find a way to remove the multiaddr hash from the peer's key, maybe replacing it with some kind of sequence number.
Spin off pstoreds package into top-level module: go-libp2p-peerstore-ds.
EDIT: Consider moving PeerInfo to somewhere more common.

The text was updated successfully, but these errors were encountered:

raulk · 2018-09-14T18:17:49Z

Tagged a few people just to make you aware of the future direction I'd like to pursue with the peerstore, following all the work that has been done already.

Stebalien · 2018-09-14T20:34:29Z

Not feasible because TTL management needs to be fine-grained (up to the multiaddr).

We can also manage this using a GC system. That is, store large records and periodically sweep.

Note: We also don't need to use TTLs if you can think of a better metric/system.

bigs · 2018-09-19T14:18:13Z

i think this is a definite case for indexing, as you suggest. a note: we can also open multiple badger data stores instead of namespacing keys if it suits. it’s designed to have that be an alternative to multiple column families.

aarshkshah1992 · 2019-07-18T16:48:43Z

@raulk What is the status of this as of date ? I'd like to help out with this by picking up some some of the chunks if possible.

raulk assigned bigs, Stebalien and whyrusleeping Sep 14, 2018

bigs added kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress labels Sep 18, 2018

bigs mentioned this issue Sep 18, 2018

Store-native TTL management libp2p/go-libp2p-peerstore#34

Merged

raulk mentioned this issue Oct 18, 2018

Extract service implementation libp2p/go-libp2p-autonat#2

Merged

Stebalien mentioned this issue Dec 18, 2018

Store peerstore data to disk ipfs/kubo#2848

Open

Stebalien mentioned this issue Sep 3, 2019

Known Memory Leaks #715

Closed

libp2p deleted a comment from bitcard Mar 1, 2021

marten-seemann transferred this issue from libp2p/go-libp2p-peerstore Aug 19, 2022

marten-seemann changed the title ~~Next steps for datastore-backed peerstore~~ peerstore: next steps for datastore-backed peerstore Aug 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

peerstore: next steps for datastore-backed peerstore #1702

peerstore: next steps for datastore-backed peerstore #1702

raulk commented Sep 14, 2018 •

edited

Loading

raulk commented Sep 14, 2018

Stebalien commented Sep 14, 2018

bigs commented Sep 19, 2018

aarshkshah1992 commented Jul 18, 2019

peerstore: next steps for datastore-backed peerstore #1702

peerstore: next steps for datastore-backed peerstore #1702

Comments

raulk commented Sep 14, 2018 • edited Loading

raulk commented Sep 14, 2018

Stebalien commented Sep 14, 2018

bigs commented Sep 19, 2018

aarshkshah1992 commented Jul 18, 2019

raulk commented Sep 14, 2018 •

edited

Loading