Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement Kademlia record store #146

Closed
tomaka opened this issue Mar 15, 2018 · 4 comments
Closed

Implement Kademlia record store #146

tomaka opened this issue Mar 15, 2018 · 4 comments
Assignees

Comments

@tomaka
Copy link
Member

tomaka commented Mar 15, 2018

It is currently unimplemented.

@tomaka
Copy link
Member Author

tomaka commented Oct 18, 2018

Record storage is about storing records that other peers want to store on our machine, and then loading them back when requested. More concretely, it's about implementing the ADD_VALUE and GET_VALUE messages. Right now when we receive such a message we immediately close the Kademlia substream.

The challenge in doing is mostly implementing the signatures system that makes it possible to prove that the record wasn't altered.

@tomaka
Copy link
Member Author

tomaka commented May 20, 2019

Considering that the user may want to store records on the disk, or cache records in a certain way, it is in my opinion preferable to not store the records directly in the Kademlia struct, but to let users entirely handle that.
In other words, when a record is requested from the network, an event would be emitted (for example, KademliaOut::FetchRecord) and the user would have to answer this request by providing the data.
A similar event (eg. KademliaOut::AddRecord) would of course be emitted when the network wants us to store a record.

@romanb
Copy link
Contributor

romanb commented May 23, 2019

Considering that the user may want to store records on the disk, or cache records in a certain way, it is in my opinion preferable to not store the records directly in the Kademlia struct, but to let users entirely handle that.
In other words, when a record is requested from the network, an event would be emitted (for example, KademliaOut::FetchRecord) and the user would have to answer this request by providing the data.
A similar event (eg. KademliaOut::AddRecord) would of course be emitted when the network wants us to store a record.

I'm not so sure. Implementing record storage and retrieval in such a way that persistence of records has a negligible probability of failure in the context of node failures, nodes joining and leaving seems non-trivial, involving a combination of regularly scheduled operations like bucket refreshes, record re-replication and record re-publication in somewhat carefully chosen intervals. And all of that seems central to a Kademlia DHT implementation. It does of course seem desirable to abstract over the concrete choice of storage (hashmap, files, dbms, ...) through a small interface, combined with configurable intervals for all the necessary regular record gossip. I do think that libp2p-kad should ship with an in-memory storage option that can configured or swapped out for something else according to a defined interface (probably containing at least functions along the lines of get, put, iter, select and validate, whereby the latter two come from the libp2p spec section regarding entry validation, which I'm not sure we want or need to have?).

romanb pushed a commit to romanb/rust-libp2p that referenced this issue Jun 30, 2019
This commit relates to [libp2p-146] and [libp2p-1089].

  * All records expire (by default, configurable).
  * Provider records are also stored in the RecordStore, and the RecordStore
    API extended.
  * Background jobs for periodic (re-)replication and (re-)publication
    of records. Regular (value-)records are subject to re-replication and
    re-publication as per standard Kademlia. Provider records are only
    subject to re-publication.
  * For standard Kademlia value lookups (quorum = 1), the record is cached
    at the closest peer to the key that did not return the value, as per
    standard Kademlia.
  * Expiration times of regular (value-)records is computed exponentially
    inversely proportional to the number of nodes between the local node
    and the closest node known to the key (beyond the k closest), as per
    standard Kademlia.

The protobuf messages are extended with two fields: `ttl` and `publisher`
in order to implement the different semantics of re-replication (by any
of the k closest peers to the key, not affecting expiry) and re-publication
(by the original publisher, resetting the expiry). This is not done yet in
other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields
have been given somewhat unique identifiers to prevent future collision.

Similarly, periodic re-publication of provider records does not seem to
be done yet in other implementations, see e.g. [libp2p-js-98].

[libp2p-146]: libp2p#146
[libp2p-1089]: libp2p#1089
[libp2p-go-323]: libp2p/go-libp2p-kad-dht#323
[libp2p-js-98]: libp2p/js-libp2p-kad-dht#98
romanb pushed a commit to romanb/rust-libp2p that referenced this issue Jun 30, 2019
This commit relates to [libp2p-146] and [libp2p-1089].

  * All records expire (by default, configurable).
  * Provider records are also stored in the RecordStore, and the RecordStore
    API extended.
  * Background jobs for periodic (re-)replication and (re-)publication
    of records. Regular (value-)records are subject to re-replication and
    re-publication as per standard Kademlia. Provider records are only
    subject to re-publication.
  * For standard Kademlia value lookups (quorum = 1), the record is cached
    at the closest peer to the key that did not return the value, as per
    standard Kademlia.
  * Expiration times of regular (value-)records is computed exponentially
    inversely proportional to the number of nodes between the local node
    and the closest node known to the key (beyond the k closest), as per
    standard Kademlia.

The protobuf messages are extended with two fields: `ttl` and `publisher`
in order to implement the different semantics of re-replication (by any
of the k closest peers to the key, not affecting expiry) and re-publication
(by the original publisher, resetting the expiry). This is not done yet in
other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields
have been given somewhat unique identifiers to prevent future collision.

Similarly, periodic re-publication of provider records does not seem to
be done yet in other implementations, see e.g. [libp2p-js-98].

[libp2p-146]: libp2p#146
[libp2p-1089]: libp2p#1089
[libp2p-go-323]: libp2p/go-libp2p-kad-dht#323
[libp2p-js-98]: libp2p/js-libp2p-kad-dht#98
romanb pushed a commit to romanb/rust-libp2p that referenced this issue Jul 3, 2019
This commit relates to [libp2p-146] and [libp2p-1089].

  * All records expire (by default, configurable).
  * Provider records are also stored in the RecordStore, and the RecordStore
    API extended.
  * Background jobs for periodic (re-)replication and (re-)publication
    of records. Regular (value-)records are subject to re-replication and
    re-publication as per standard Kademlia. Provider records are only
    subject to re-publication.
  * For standard Kademlia value lookups (quorum = 1), the record is cached
    at the closest peer to the key that did not return the value, as per
    standard Kademlia.
  * Expiration times of regular (value-)records is computed exponentially
    inversely proportional to the number of nodes between the local node
    and the closest node known to the key (beyond the k closest), as per
    standard Kademlia.

The protobuf messages are extended with two fields: `ttl` and `publisher`
in order to implement the different semantics of re-replication (by any
of the k closest peers to the key, not affecting expiry) and re-publication
(by the original publisher, resetting the expiry). This is not done yet in
other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields
have been given somewhat unique identifiers to prevent future collision.

Similarly, periodic re-publication of provider records does not seem to
be done yet in other implementations, see e.g. [libp2p-js-98].

[libp2p-146]: libp2p#146
[libp2p-1089]: libp2p#1089
[libp2p-go-323]: libp2p/go-libp2p-kad-dht#323
[libp2p-js-98]: libp2p/js-libp2p-kad-dht#98
romanb pushed a commit to romanb/rust-libp2p that referenced this issue Jul 3, 2019
This commit relates to [libp2p-146] and [libp2p-1089].

  * All records expire (by default, configurable).
  * Provider records are also stored in the RecordStore, and the RecordStore
    API extended.
  * Background jobs for periodic (re-)replication and (re-)publication
    of records. Regular (value-)records are subject to re-replication and
    re-publication as per standard Kademlia. Provider records are only
    subject to re-publication.
  * For standard Kademlia value lookups (quorum = 1), the record is cached
    at the closest peer to the key that did not return the value, as per
    standard Kademlia.
  * Expiration times of regular (value-)records is computed exponentially
    inversely proportional to the number of nodes between the local node
    and the closest node known to the key (beyond the k closest), as per
    standard Kademlia.

The protobuf messages are extended with two fields: `ttl` and `publisher`
in order to implement the different semantics of re-replication (by any
of the k closest peers to the key, not affecting expiry) and re-publication
(by the original publisher, resetting the expiry). This is not done yet in
other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields
have been given somewhat unique identifiers to prevent future collision.

Similarly, periodic re-publication of provider records does not seem to
be done yet in other implementations, see e.g. [libp2p-js-98].

[libp2p-146]: libp2p#146
[libp2p-1089]: libp2p#1089
[libp2p-go-323]: libp2p/go-libp2p-kad-dht#323
[libp2p-js-98]: libp2p/js-libp2p-kad-dht#98
romanb added a commit that referenced this issue Jul 17, 2019
* Somewhat complete the implementation of Kademlia records.

This commit relates to [libp2p-146] and [libp2p-1089].

  * All records expire (by default, configurable).
  * Provider records are also stored in the RecordStore, and the RecordStore
    API extended.
  * Background jobs for periodic (re-)replication and (re-)publication
    of records. Regular (value-)records are subject to re-replication and
    re-publication as per standard Kademlia. Provider records are only
    subject to re-publication.
  * For standard Kademlia value lookups (quorum = 1), the record is cached
    at the closest peer to the key that did not return the value, as per
    standard Kademlia.
  * Expiration times of regular (value-)records is computed exponentially
    inversely proportional to the number of nodes between the local node
    and the closest node known to the key (beyond the k closest), as per
    standard Kademlia.

The protobuf messages are extended with two fields: `ttl` and `publisher`
in order to implement the different semantics of re-replication (by any
of the k closest peers to the key, not affecting expiry) and re-publication
(by the original publisher, resetting the expiry). This is not done yet in
other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields
have been given somewhat unique identifiers to prevent future collision.

Similarly, periodic re-publication of provider records does not seem to
be done yet in other implementations, see e.g. [libp2p-js-98].

[libp2p-146]: #146
[libp2p-1089]: #1089
[libp2p-go-323]: libp2p/go-libp2p-kad-dht#323
[libp2p-js-98]: libp2p/js-libp2p-kad-dht#98

* Tweak kad-ipfs example.

* Add missing files.

* Ensure new delays are polled immediately.

To ensure task notification, since `NotReady` is returned right after.

* Fix ipfs-kad example and use wasm_timer.

* Small cleanup.

* Incorporate some feedback.

* Adjustments after rebase.

* Distinguish events further.

In order for a user to easily distinguish the result of e.g.
a `put_record` operation from the result of a later republication,
different event constructors are used. Furthermore, for now,
re-replication and "caching" of records (at the closest peer to
the key that did not return a value during a successful lookup)
do not yield events for now as they are less interesting.

* Speed up tests for CI.

* Small refinements and more documentation.

  * Guard a node against overriding records for which it considers
    itself to be the publisher.

  * Document the jobs module more extensively.

* More inline docs around removal of "unreachable" addresses.

* Remove wildcard re-exports.

* Use NonZeroUsize for the constants.

* Re-add method lost on merge.

* Add missing 'pub'.

* Further increase the timeout in the ipfs-kad example.

* Readd log dependency to libp2p-kad.

* Simplify RecordStore API slightly.

* Some more commentary.

* Change Addresses::remove to return Result<(),()>.

Change the semantics of `Addresses::remove` so that the error case
is unambiguous, instead of the success case. Use the `Result` for
clearer semantics to that effect.

* Add some documentation to .
@romanb
Copy link
Contributor

romanb commented Jul 24, 2019

Closing this as a result of #1189. I think any specific missing pieces or bugs should be filed in new, separate issues.

@romanb romanb closed this as completed Jul 24, 2019
santos227 pushed a commit to santos227/rustlib that referenced this issue Jun 20, 2022
* Somewhat complete the implementation of Kademlia records.

This commit relates to [libp2p-146] and [libp2p-1089].

  * All records expire (by default, configurable).
  * Provider records are also stored in the RecordStore, and the RecordStore
    API extended.
  * Background jobs for periodic (re-)replication and (re-)publication
    of records. Regular (value-)records are subject to re-replication and
    re-publication as per standard Kademlia. Provider records are only
    subject to re-publication.
  * For standard Kademlia value lookups (quorum = 1), the record is cached
    at the closest peer to the key that did not return the value, as per
    standard Kademlia.
  * Expiration times of regular (value-)records is computed exponentially
    inversely proportional to the number of nodes between the local node
    and the closest node known to the key (beyond the k closest), as per
    standard Kademlia.

The protobuf messages are extended with two fields: `ttl` and `publisher`
in order to implement the different semantics of re-replication (by any
of the k closest peers to the key, not affecting expiry) and re-publication
(by the original publisher, resetting the expiry). This is not done yet in
other libp2p Kademlia implementations, see e.g. [libp2p-go-323]. The new protobuf fields
have been given somewhat unique identifiers to prevent future collision.

Similarly, periodic re-publication of provider records does not seem to
be done yet in other implementations, see e.g. [libp2p-js-98].

[libp2p-146]: libp2p/rust-libp2p#146
[libp2p-1089]: libp2p/rust-libp2p#1089
[libp2p-go-323]: libp2p/go-libp2p-kad-dht#323
[libp2p-js-98]: libp2p/js-libp2p-kad-dht#98

* Tweak kad-ipfs example.

* Add missing files.

* Ensure new delays are polled immediately.

To ensure task notification, since `NotReady` is returned right after.

* Fix ipfs-kad example and use wasm_timer.

* Small cleanup.

* Incorporate some feedback.

* Adjustments after rebase.

* Distinguish events further.

In order for a user to easily distinguish the result of e.g.
a `put_record` operation from the result of a later republication,
different event constructors are used. Furthermore, for now,
re-replication and "caching" of records (at the closest peer to
the key that did not return a value during a successful lookup)
do not yield events for now as they are less interesting.

* Speed up tests for CI.

* Small refinements and more documentation.

  * Guard a node against overriding records for which it considers
    itself to be the publisher.

  * Document the jobs module more extensively.

* More inline docs around removal of "unreachable" addresses.

* Remove wildcard re-exports.

* Use NonZeroUsize for the constants.

* Re-add method lost on merge.

* Add missing 'pub'.

* Further increase the timeout in the ipfs-kad example.

* Readd log dependency to libp2p-kad.

* Simplify RecordStore API slightly.

* Some more commentary.

* Change Addresses::remove to return Result<(),()>.

Change the semantics of `Addresses::remove` so that the error case
is unambiguous, instead of the success case. Use the `Result` for
clearer semantics to that effect.

* Add some documentation to .
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants