Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use raw multihashes for providers #755

Closed
Stebalien opened this issue Oct 24, 2018 · 11 comments
Closed

Use raw multihashes for providers #755

Stebalien opened this issue Oct 24, 2018 · 11 comments
Assignees

Comments

@Stebalien
Copy link
Member

Currently, we're using CIDs. However, multiple CIDs can map to the same block (v1, v0, different codecs). Due to the base32 CID migration effort, we'll need to switch over the routing system to use raw multihashes.

Note: Given that raw multihashes are still valid V0 CIDs, old servers will continue to work.

Proposed changes:

  1. Change the routing interfaces to take multihashes.
  2. In handleAddProvider, just use strings/[]byte; don't parse anything (although we can assert that the length < 100 bytes (or something like that). We could also just use mutlihashes but then we'd need to change the multihash library to handle unknown multihash codes (which we may want to do anyways?).

@kevina is working on this from the go-ipfs side but he'll need someone on the libp2p side to implement these content-routing changes. Any takers?

@Stebalien Stebalien assigned ghost and kevina Oct 24, 2018
@Stebalien
Copy link
Member Author

(note: there may be other ways to do this but, at the end of the day, we need to be able to lookup a raw multihash of a block in the DHT).

@kevina
Copy link
Contributor

kevina commented Oct 29, 2018

Also see: https://github.com/ipfs/go-ipfs-blockstore/issues/8 and ipfs/kubo#5510 (comment) for context.

I made an attempt to push it though. I basically changed the interface in libp2p/go-libp2p-routing#33 and then semi-blindly pushed this through. Everything seams to work, but not really knowing how the network layers of ipfs work I am sure I missed things. Shareness tests are passing locally in go-ipfs.

Most of the significant changes are in libp2p/go-libp2p-kad-dht#203.

Other changes are in: libp2p/go-libp2p-pubsub-router#14, libp2p/go-libp2p-routing-helpers#13, ipfs/go-bitswap#18, ipfs/go-ipfs-routing#15

@kevina
Copy link
Contributor

kevina commented Oct 30, 2018

Original IRC Conversation (mostly for my own reference):

kevina: isn't your proposed change in the wire protocol?
kevina: That is changing provide and find providers to use multihash means we will start publishing multihashes and not Cid's
kevina: Or am I missing something?
stebalien: That's what my proposed change is. Basically, we'll use CIDs in bitswap to give the other side as much information as we have but we need to use multihashes in provide or as provide is really a block-level thing.
stebalien: TLDR: yes. kevina (IRC) ^^
kevina: stebalien: okay, then how do we handle older nodes? Do we need to bump the protocol version string?
kevina: (my knowledge of that part of the code is very limited)
stebalien: I don't think so. multihashes are valid CIDs.
stebalien: However, we'll probably need to change the routing interfaces.
stebalien: And new DHT nodes will need to reject provider records with CIDs (i.e., anything that's not just a multihash). Unfortunately, we can't just accept them and respond to CIDv1 and CIDv0 requests because of the way DHTs work (if a user puts a CIDv1 CID on our node, users looking for the CIDv0 version won't find us).
kevina: stebalien: sorry I missed your message, multihashes are valid CIDs but a bare multihash implies CidV0
kevina: I am really unfamiliar with that part of the code also, so I am having a hard time following...
stebalien: kevina (IRC): Basically, new clients will work with old servers because the multihashes will be valid CIDs. Old clients won't work with new servers when putting/getting CIDv1 CIDs but that's probably fine.
stebalien: you'll probably need to work with someone on the libp2p team here...

@kevina
Copy link
Contributor

kevina commented Nov 5, 2018

@Stebalien I spend some catching on how IPFS works at the network lawyer and I am not sure I like your solution. For simplicity let's assume we just use the raw binary string for the DHT and not try to decode into a CID or Multihash. Once the newer nodes switch to using Multihashes this will (1) prevent older nodes from discovering CIDv1 content on newer nodes as they will be looking for CIDv1 strings and newer nodes will index based on the multihash
(2) prevent newer nodes from discovering CIDv1 context on older nodes as newer nodes will always look for the multihash and thus won't find CIDv1 strings. The effect of this is that will will partly partition the network in two as for as CIDv1 context is concerned. Is this correct or am I missing something?

If we do as you propose and reject older nodes in the newer nodes DHT I don't see how it will help with (1) or (2) and it may make the situation worse.

I need to study this more, but your insight (or those from the libp2p folks) will be helpful.

@magik6k
Copy link
Contributor

magik6k commented Nov 6, 2018

I don't have any specific numbers, but my guess would be that vast majority of requests are for CIDv0 content (note that for stuff with raw leaves we still usually start from cidv0 root), so this shouldn't have much impact on most people. Also afaik bitswap will use some workaround for this, so connected nodes won't even notice this. For the remaining few impacted users it should be easy enough to say 'just update'

@Stebalien
Copy link
Member Author

So, I can see two solutions:

  1. Publish both records.
  2. Pick one.

Unfortunately, publishing these records is already really expensive and, luckily, we don't have many users of CIDv1 anyways. Most users of CIDv1 (and IPLD) are using javascript which doesn't currently interoperate with the DHT.

So, yeah, I'm not happy with any of these solutions but I feel that:

  1. Using raw multihashes.
  2. Not validating anything (or just validating the multihash structure but not the actual codec).

Is the most flexible approach.

@aschmahmann
Copy link
Collaborator

A few (likely stupid) questions:

However, multiple CIDs can map to the same block (v1, v0, different codecs).

@Stebalien is multiple codecs for the same data a common/serious issue?

Is the implication here that multihashes are effectively our "data identifiers", but CIDs are "content" = data + metadata identifiers?

I'm trying to understand 1) what CIDs are meant to be used for if not identifying content? 2) what's the plan for solving the multiple addresses per content issue as sha256 becomes obsolete, don't we run into these issues all over again?

@Stebalien
Copy link
Member Author

is multiple codecs for the same data a common/serious issue?

Not yet but this could be useful for, e.g., packing IPLD data up into an archive while still deduplicating the underlying blocks. That is, it allows one to build multiple "alternative" DAGs/views with a single set of leaves.

Is the implication here that multihashes are effectively our "data identifiers", but CIDs are "content" = data + metadata identifiers?

Yeah, this is a bit funky. Really, CIDs are "typed" identifiers while multihashes identify blobs of data. We could also just use the raw data multicodec but the current proposal of "just use multihashes" is the simplest solution, IMO.

@Stebalien Stebalien transferred this issue from libp2p/go-libp2p-routing Dec 6, 2019
@aschmahmann
Copy link
Collaborator

aschmahmann commented Dec 9, 2019

@Stebalien any chance we can do an easier version of this that doesn't require interface changes so we can ship this soon?

Proposal: Leave the interfaces the same, but send multihashes over the wire. Since CIDv0 is a multihash none of those records should change, CIDv1 records will change (so newer and older clients will have not be able to find each others' CIDv1 data).

Is the CIDv1 record incompatibility a problem? If so then the only options I can think of right now are for clients to do multiple queries for a while before we deprecate the publishing CIDv1s, or to rely on a dht version bump to force the deprecation.

@Stebalien
Copy link
Member Author

Stebalien commented Dec 10, 2019 via email

@jacobheun
Copy link
Contributor

Closing this as multihashes are now being used.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants