-
Notifications
You must be signed in to change notification settings - Fork 161
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Map id to cid instead of multihash #336
Conversation
this is really problematic, the dht only works on multihashes for provider records, so we must be able to find content based on only multihashes |
one alternative I could see is to store links as a a mapping id + codec to links, such that we can store links per codec, but the content itself is still referenced by multihash |
OK, will try that then. Edit: That is also not that easy. That means that links needs to be a mapping from (code, id) to [(code, id)] not just to [id], otherwise I won't be able to traverse the graph anymore. As somebody who has been working on the store side, I want the part that does the traversal to benefit from the u64 ids... |
Doesn't bitswap work on the level of cids? Where exactly do you need to look up content by multihash currently? |
|
Ah, I did not know that.
That just requires going from cid to multihash, not vice versa, so no big deal. So as far as I can see there are two solutions:
I guess it depends on what we expect to be the more frequent use case - lookup by cid or by hash. |
534a274
to
97e2275
Compare
So I guess everything is fine as long as we can implement these two efficiently, correct? #[tracing::instrument(skip(self))]
pub async fn get_blob_by_hash(&self, hash: &Multihash) -> Result<Option<DBPinnableSlice<'_>>> {
todo!()
}
#[tracing::instrument(skip(self))]
pub async fn has_blob_for_hash(&self, hash: &Multihash) -> Result<bool> {
todo!()
} |
01d4a98
to
924ad8f
Compare
924ad8f
to
1aceae9
Compare
1918e58
to
331bd33
Compare
@rklaehn yes |
2258834
to
66e7972
Compare
The gist of the above is that ids are still mapped to cids, but the key for looking up ids from cids is
So to look up ids by hash you can just do a prefix search by hash and get all codes for which we have the hash. Just pick one that has associated data. Typically, in 99.9% of all cases, there should be exactly one? Should we go with this? The other alternative would be to have id correspond to hash, and store (id, code) in both graph key and graph content. But I would really like to avoid that since it will make all graph traversal code slower and more cumbersome, for something that is not needed 99.9% of the time. I think this case definitely needs to be solved one way or another before v1, because it allows you to get the store in a weird state (it is confused about links) from the outside. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a good way to go 👍
c57e8ea
to
7be7418
Compare
Addressed your PR comments. These tests are pretty much just smoke tests. I think we need more tests, but that is unrelated to this particular issue. See n0-computer/beetle#151 |
Currently failing because of #335
That way you can have 2 cids with the same hash but different links. Downside is that you might store the same data twice in the very unlikely case where you have the same data as both raw and dag-cb. ¯\_(ツ)_/¯
This is one of the two ways of doing this: - downside: in the rare case where there are 2 cids with the same hash, the data gets stored twice - upside: storing the graph can be done using just u64 ids instead of (code, id) tuples
39e29b6
to
df0600c
Compare
Fixes n0-computer/beetle#147