Map id to cid instead of multihash #336

rklaehn · 2022-10-13T12:35:25Z

dignifiedquire · 2022-10-13T12:39:20Z

this is really problematic, the dht only works on multihashes for provider records, so we must be able to find content based on only multihashes

dignifiedquire · 2022-10-13T12:40:58Z

one alternative I could see is to store links as a a mapping id + codec to links, such that we can store links per codec, but the content itself is still referenced by multihash

rklaehn · 2022-10-13T12:43:08Z

one alternative I could see is to store links as a a mapping id + codec to links, such that we can store links per codec, but the content itself is still referenced by multihash

OK, will try that then.

Edit: That is also not that easy. That means that links needs to be a mapping from (code, id) to [(code, id)] not just to [id], otherwise I won't be able to traverse the graph anymore. As somebody who has been working on the store side, I want the part that does the traversal to benefit from the u64 ids...

rklaehn · 2022-10-13T12:54:11Z

this is really problematic, the dht only works on multihashes for provider records, so we must be able to find content based on only multihashes

Doesn't bitswap work on the level of cids? Where exactly do you need to look up content by multihash currently?

dignifiedquire · 2022-10-13T12:57:26Z

bitswap works with both depending on the protocol version
when we provide records they are going to be by multihash, not by cid
when making queries to the dht we always convert to a multihash

rklaehn · 2022-10-13T13:12:06Z

bitswap works with both depending on the protocol version

Ah, I did not know that.

hey are going to be by multihash, not by cid

when making queries to the dht we always convert to a multihash

That just requires going from cid to multihash, not vice versa, so no big deal.

So as far as I can see there are two solutions:

change the graph table so that the key is (code, hash_id) and the value is [(code, hash_id)]
I would prefer to avoid that since it makes graph traversal less efficient
keep ids as 1:1 with cids but provide a way to look up blobs by hash nonetheless. Key for id could be <hash><code>, then just look for entries with a prefix of .

I guess it depends on what we expect to be the more frequent use case - lookup by cid or by hash.

rklaehn · 2022-10-13T13:18:19Z

So I guess everything is fine as long as we can implement these two efficiently, correct?

    #[tracing::instrument(skip(self))]
    pub async fn get_blob_by_hash(&self, hash: &Multihash) -> Result<Option<DBPinnableSlice<'_>>> {
      todo!()
    }

    #[tracing::instrument(skip(self))]
    pub async fn has_blob_for_hash(&self, hash: &Multihash) -> Result<bool> {
      todo!()
    }

dignifiedquire · 2022-10-13T18:30:33Z

@rklaehn yes

rklaehn · 2022-10-17T08:51:53Z

The gist of the above is that ids are still mapped to cids, but the key for looking up ids from cids is

<hash><code>.

So to look up ids by hash you can just do a prefix search by hash and get all codes for which we have the hash. Just pick one that has associated data. Typically, in 99.9% of all cases, there should be exactly one?

Should we go with this? The other alternative would be to have id correspond to hash, and store (id, code) in both graph key and graph content. But I would really like to avoid that since it will make all graph traversal code slower and more cumbersome, for something that is not needed 99.9% of the time.

I think this case definitely needs to be solved one way or another before v1, because it allows you to get the store in a weird state (it is confused about links) from the outside.

iroh-store/src/store.rs

dignifiedquire

I think this is a good way to go 👍

rklaehn · 2022-10-17T10:59:23Z

Addressed your PR comments.

These tests are pretty much just smoke tests. I think we need more tests, but that is unrelated to this particular issue. See n0-computer/beetle#151

Currently failing because of #335

That way you can have 2 cids with the same hash but different links. Downside is that you might store the same data twice in the very unlikely case where you have the same data as both raw and dag-cb. ¯\_(ツ)_/¯

This is one of the two ways of doing this: - downside: in the rare case where there are 2 cids with the same hash, the data gets stored twice - upside: storing the graph can be done using just u64 ids instead of (code, id) tuples

rklaehn force-pushed the rklaehn/id-cid-mapping branch from 534a274 to 97e2275 Compare October 13, 2022 13:12

rklaehn force-pushed the rklaehn/id-cid-mapping branch 3 times, most recently from 01d4a98 to 924ad8f Compare October 13, 2022 15:47

rklaehn marked this pull request as ready for review October 13, 2022 15:49

rklaehn force-pushed the rklaehn/id-cid-mapping branch from 924ad8f to 1aceae9 Compare October 13, 2022 15:52

rklaehn requested a review from dignifiedquire October 13, 2022 15:59

rklaehn force-pushed the rklaehn/id-cid-mapping branch from 1918e58 to 331bd33 Compare October 13, 2022 16:39

b5 assigned rklaehn Oct 13, 2022

rklaehn force-pushed the rklaehn/id-cid-mapping branch 4 times, most recently from 2258834 to 66e7972 Compare October 14, 2022 09:04

dignifiedquire reviewed Oct 17, 2022

View reviewed changes

iroh-store/src/store.rs Show resolved Hide resolved

dignifiedquire reviewed Oct 17, 2022

View reviewed changes

iroh-store/src/store.rs Show resolved Hide resolved

dignifiedquire reviewed Oct 17, 2022

View reviewed changes

iroh-store/src/store.rs Outdated Show resolved Hide resolved

dignifiedquire reviewed Oct 17, 2022

View reviewed changes

rklaehn force-pushed the rklaehn/id-cid-mapping branch from c57e8ea to 7be7418 Compare October 17, 2022 10:10

rklaehn requested a review from dignifiedquire October 17, 2022 10:59

rklaehn mentioned this pull request Oct 17, 2022

Remove record level versioning #354

Merged

dignifiedquire approved these changes Oct 17, 2022

View reviewed changes

rklaehn added 6 commits October 17, 2022 18:57

test: ensure that different cids with the same hash work

edbfa9d

Currently failing because of #335

fix: use code and hash (so basically cid) as id key

903557d

That way you can have 2 cids with the same hash but different links. Downside is that you might store the same data twice in the very unlikely case where you have the same data as both raw and dag-cb. ¯\_(ツ)_/¯

perf: avoid allocation when creating an id key

19df7f0

feat: Add way to get data by hash

269ac64

This is one of the two ways of doing this: - downside: in the rare case where there are 2 cids with the same hash, the data gets stored twice - upside: storing the graph can be done using just u64 ids instead of (code, id) tuples

docs: update comment to match new db layout

e98c83e

refactor: address PR review comments

df0600c

rklaehn force-pushed the rklaehn/id-cid-mapping branch from 39e29b6 to df0600c Compare October 17, 2022 15:57

rklaehn merged commit 092bdf7 into n0-computer:main Oct 17, 2022

rklaehn deleted the rklaehn/id-cid-mapping branch October 17, 2022 16:10

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Map id to cid instead of multihash #336

Map id to cid instead of multihash #336

rklaehn commented Oct 13, 2022

dignifiedquire commented Oct 13, 2022 •

edited

Loading

dignifiedquire commented Oct 13, 2022

rklaehn commented Oct 13, 2022 •

edited

Loading

rklaehn commented Oct 13, 2022

dignifiedquire commented Oct 13, 2022

rklaehn commented Oct 13, 2022 •

edited

Loading

rklaehn commented Oct 13, 2022

dignifiedquire commented Oct 13, 2022

rklaehn commented Oct 17, 2022

dignifiedquire left a comment

rklaehn commented Oct 17, 2022

Map id to cid instead of multihash #336

Map id to cid instead of multihash #336

Conversation

rklaehn commented Oct 13, 2022

dignifiedquire commented Oct 13, 2022 • edited Loading

dignifiedquire commented Oct 13, 2022

rklaehn commented Oct 13, 2022 • edited Loading

rklaehn commented Oct 13, 2022

dignifiedquire commented Oct 13, 2022

rklaehn commented Oct 13, 2022 • edited Loading

rklaehn commented Oct 13, 2022

dignifiedquire commented Oct 13, 2022

rklaehn commented Oct 17, 2022

dignifiedquire left a comment

Choose a reason for hiding this comment

rklaehn commented Oct 17, 2022

dignifiedquire commented Oct 13, 2022 •

edited

Loading

rklaehn commented Oct 13, 2022 •

edited

Loading

rklaehn commented Oct 13, 2022 •

edited

Loading