Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

peerstore: next steps for datastore-backed peerstore #1702

Open
raulk opened this issue Sep 14, 2018 · 4 comments
Open

peerstore: next steps for datastore-backed peerstore #1702

raulk opened this issue Sep 14, 2018 · 4 comments
Assignees
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress

Comments

@raulk
Copy link
Member

raulk commented Sep 14, 2018

Following libp2p/go-libp2p-peerstore#34, we're in a pretty good state vs. baseline. Performance benchcmp here: libp2p/go-libp2p-peerstore#31 (comment)

Summing up my thoughts here on how to move forward.

  • We should find a way to reduce the cost of Peers(). IPFS doesn't call it from critical paths (only CLI commands), but as libp2p powers more projects, we should not make assumptions about how users will be calling us.

    • With the current db layout, PeersWithAddrs() traverses the entire KV store building a set of unique peers.
    • We obviously want to avoid doing that. Options:
      • Jamming lists of multiaddrs in peer-addresses entries, to reduce total key count.
        • Not feasible because TTL management needs to be fine-grained (up to the multiaddr).
      • Controlling the iterator seek behaviour. The go-datastore iterator doesn't allow us to seek arbitrarily, but badger does. Instead of ingesting all keys, and comparing them, we want to perform a "fold" operation taking advantage of the fact that badger is an ordered LSM tree. When we encounter a new peer, we skip over all keys until ValidForPrefix is no longer true. That means we've encountered another peer, and we use that as the new prefix to seek against.
        • Unfortunately, this only works for ordered key-value stores.
      • Store an index of peers, e.g. in a /peeridx keyspace. Then we can just iterate through a small keyspace instead of the entire database.
        • This is my fave, but given that TTL expires keys at the DB level without us coming to know, maintaining those indices aligned with the true existence of keys for peers is non trivial. I have several ideas here.
  • Regarding option 3 above, find a way to store the earliest and tardiest expirations.

    • The tardiest coincides with when a peer is gone from the DB (if an explicit delete doesn't occur earlier, e.g. setTTL to 0), i.e. it can help us manage the peer index.
    • The earliest coincides with when the cache entry for a peer should be invalidated. In this way, we don't need to calculate this timestamp every time we load a peer from DB to the cache.
    • We could use merge operators for this, but they are too specific to certain implementations, and also eventually consistent. I guess we need strict consistency.
    • Tracking these time bounds is easily implemented if it weren't for explicit deletes:
      • On every insert or update you can update the bounds if the newly added entry changes them.
      • Deletes become non-deterministic. If the deleted entries are defining the bounds (i.e. you are deleting the earliest entry or the tardiest entry to be expired), you need to iterate through all peer's keys to recalculate those bounds.
      • I'd argue it's a worthy tradeoff, though, as it saves time on get, which is probably way more frequent than deletes.
  • Use the ds.Key functionality to create key hierarchies, e.g. /peer:<id>/addr:<hash or index> => multiaddr.

  • Migrate the KeyBook and the PeerMetadata substores, under appropriate hierarchical namespaces as per above.

  • Find a way to remove the multiaddr hash from the peer's key, maybe replacing it with some kind of sequence number.

  • Spin off pstoreds package into top-level module: go-libp2p-peerstore-ds.

  • EDIT: Consider moving PeerInfo to somewhere more common.

@raulk
Copy link
Member Author

raulk commented Sep 14, 2018

Tagged a few people just to make you aware of the future direction I'd like to pursue with the peerstore, following all the work that has been done already.

@Stebalien
Copy link
Member

Not feasible because TTL management needs to be fine-grained (up to the multiaddr).

We can also manage this using a GC system. That is, store large records and periodically sweep.

Note: We also don't need to use TTLs if you can think of a better metric/system.

@bigs bigs added kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress labels Sep 18, 2018
@bigs
Copy link
Contributor

bigs commented Sep 19, 2018

i think this is a definite case for indexing, as you suggest. a note: we can also open multiple badger data stores instead of namespacing keys if it suits. it’s designed to have that be an alternative to multiple column families.

@aarshkshah1992
Copy link
Contributor

@raulk What is the status of this as of date ? I'd like to help out with this by picking up some some of the chunks if possible.

@libp2p libp2p deleted a comment from bitcard Mar 1, 2021
@marten-seemann marten-seemann transferred this issue from libp2p/go-libp2p-peerstore Aug 19, 2022
@marten-seemann marten-seemann changed the title Next steps for datastore-backed peerstore peerstore: next steps for datastore-backed peerstore Aug 19, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
kind/enhancement A net-new feature or improvement to an existing feature status/in-progress In progress
Projects
None yet
Development

No branches or pull requests

5 participants