-
Notifications
You must be signed in to change notification settings - Fork 280
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Specify peer ID hash function in public key #111
Comments
Somewhat related https://github.com/libp2p/go-libp2p-crypto/issues/51. Very related: libp2p/go-libp2p#484. |
Current representation of PeerIDs is base58, which is case-sensitive and thus not Origin-safe. In the context of IPFS CIDs switching to base32 the time feels right to solve Origin problem for IPNS websites as well. So the ask from Web Browsers WG is:
|
To me having a peer ID that is anything more than the hash of a public key would feel very over-engineered. |
So, at the very least, we need to support multiple hash functions. That's what this is about. The idea behind using CIDs is that would bring this in-line with everything else. In addition to being able to upgrade the hash function, we'd:
Of course, the other side of this issue is that we'd be making libp2p depend on IPLD and I understand that the libp2p team may not be happy with that. So, let's try to not tie all of these together. |
Concrete non-CID proposal (kill two birds with one stone):
Note 1: 5 is not a network-breaking change as these won't cross the network anyways. My concern is that this does make switching to CID-backed peer IDs painful. Again, we don't need that to be a network-breaking change as we can continue using raw multihash CIDs on the network (for now) however, technically, we could run into issues if we ever bump the CID version number to something that's a valid multihash codec. |
Concrete CID proposal (kill three birds with a planet):
This doesn't add full CID-peer-ID support. Instead, it just gives us a way to convert between them and allows us to use CIDs in text (and gives us some room to upgrade later). |
Using CIDs has another benefit. As we move to accepting multiple keys, we will eventually need to think of a key as a file that might have multiple GBs and using the same identified that we use for content (therefore it being called Content Identified) will enable us to support that smoothly. More notes on ipfs/specs#58 |
Concrete CIDs all the way proposal (that's no moon):
The primary reason to go with this over the "kill three birds with a planet" scheme is that that scheme only works as long as a CID version number never conflicts with a multihash code (other than 0). Once one does, we won't be able to distinguish between a PIDv0 and a CIDvN. |
I don't think the text format of a peer ID should be relevant to the libp2p specs at all. This spec is about how things are transmitted over the network, and the peer ID is never transmitted in text format as far as I know. Transmitting peer IDs as CIDs over the network would be wasteful, as you're only ever going to require raw data anyway if you don't want to run into tons of crazy corner cases. If IPFS or IPLD or Filecoin want to format peer IDs as CIDs, they can do so and decode the ID into a multihash before using it in libp2p. However that would not be relevant to the libp2p specs. |
We also need to mention that connecting to a peer whose public key reports a different hash function that the one we have should be treated as a key mismatch, even if the hash itself is correct. Supporting multiple hash functions raises the question of the number of kbuckets in Kademlia, however I'm not an expert enough to know if this is actually a problem, or if it's attackable. |
I very much want to preserve public key inlining in the identity, as it is a key property for pusbub message signing to be efficient. |
Well, given that this is a system for humans, I'd have to strongly disagree. There would be no web without URLs/URIs. These IDs do show up in text all the time. Usually in the form
CIDs give us flexibility. With raw multihashes, we're stuck with the current key format and can't change it. With CIDs, we can introduce new formats unambiguously (because the CID itself states the key format).
KAD re-hashes everything anyways using sha2-256.
That would be a mismatch. We'd be mapping two names (peer IDs) onto two different peers. We never want to end up in a situation where we assign more than one peer ID to a single, logical, identity.
We can keep support, we just need to avoid creating keys with inlined peer IDs by default. |
They show up only if you use the IPFS, Filecoin or IPNS CLI. As far as I understand, libp2p is supposed to be purely a library. Things built on top of libp2p can decide whether to a multibase and convert that to a multihash before passing to libp2p.
The peer ID is the hash of an arbitrary vector of bytes. Right now this underlying vector of bytes is a protobuf struct, but could be changed to be anything. |
What do you mean "re-hashes"? |
We treat peer IDs as opaque keys and always hash them with sha256 (like we do with other keys). Basically, we make no assumptions about the peer ID format when figuring out where a peer ID fits in the DHT keyspace.
Libp2p is a library but it's also an interoperable network protocol. Peer IDs need interoperable, human-readable forms just like IP addresses, MAC addresses, etc.
You'd have to somehow communicate how the this "arbitrary byte vector" should be decoded and how it should be understood as a public key. We currently use the codec in CIDs to communicate this kind of information. |
But why do they need need a human-readable form, since they are only ever transmitted as bytes over the network? I'm not saying their human-readable form shouldn't be defined. I'm saying that the network protocol shouldn't be changed because of out-of-scope human-readable reasons. |
Does that mean that if a remote says that its public key should be hashed with sha512, then we hash it with sha256 anyway for DHT insertion purposes? But then, if we receive a Kademlia RPC query, do we answer with sha256 hashes, or do we answer with sha512 hashes? Or both (the protocol can't do both at the moment)? |
Hey, can't quite pull up all the context to this at the moment, but my intuition in the past has been to make PeerIDs a subset of CIDs (so that the meaning of "CID" would change to "Cryptographic IDentifier"), and to do much of the same CIDv0/CIDv1 dance we did in content-land. It'd avoid duplication of concepts/protocols, and it'd be a migration that we already have some experience in. It would also be super useful to do the same base32 thing that we're doing with content CIDs, so we can seamlessly use the peer CIDs in URLs. I sense the discussion above is already going into that direction, and I can't add anything new to that at the moment, so I'll leave it at that :)
They get put into multiaddrs all the time ( |
That's just moving the problem. Multiaddresses are also only ever transmitted as binary, and thus don't need to have their human-readable version defined as part of the network protocol. I'm extremely confused as to why it's a good idea to make machine A decide in which format a peer id should be displayed on machine B, and not just let machine B decide.
That's not related to the wire protocol, which is what this is all about. |
Some notes from skimming this:
What about having the peerID actually be the public key? I think that its unarguable how useful that is. How would you go about representing that distinction over the wire?
|
Does that mean that you store both the peerID and the sha256 of the public key?
I'm not sure I understand the question. We know whether we receive a peer ID or a public key based on the protocol. For example
I agree that everything should have the same human-readable representation, but I don't see how it is a good solution to let a remote node on the network choose what the representation of their key should be. |
I completely agree. That's why I listed three options:
Why bundle these questions together? Unfortunately, wire format changes tend to affect display and parsing.
That was never the intention here. The key's creator needs to be able to choose the hash function used in the peer ID so they don't end up with multiple peer IDs (not for display, for code) however, they don't get to, e.g., decide on the base encoding the user decides to use. |
I guess I misunderstood then. |
I would still love to get some clarifications on how Kademlia would work after allowing hashes other than SHA-256. |
Technically, it does. However, we don't actually do this anywhere (well, except for the CBOR format) because it's redundant (and I'm certainly not going to argue for it).
Kademlia uses sha-256 and only sha-256 for all keys. Given a peer ID Note: This does mean that |
that's a backwards incompatible change, we can't do that. |
You can have a new version of |
IMO, the original motivations for embedding the key in the peer ID has now mostly been obsoleted by simply embedding the key in the record (which I believe is the "more correct" solution). We can make the peer ID optional and eventually stop including it in favor of simply deriving it from the key. However, we should still support alternative hash functions for peer IDs:
|
A potential killer app for inlined keys is packet switching and packet protocols in general. |
What multicodec code would be used in a CID scheme? What advantages/disadvantages are there for making a peer ID code? |
@anacrolix Implying (?!) that you could actually replace |
@Zolmeister no. |
cc @andrewxhill, @momack2, @whyrusleeping (please read #111 (comment) for context) Update: It turns out Textile was using ed25519 keys, with inlining. Other than requiring a forced-upgrade of the textile network, the only solution I can see for now is:
And do this all before the next go-ipfs release, which is almost a month overdue... 😭. @andrewxhill does Textile use IPNS and, if so, how? Also, does textile hardcode any string-formatted ed25519 peer IDs anywhere that can't be automatically migrated? I ask because of consideration 3 above. That is, if we change the textual format of inlined peer IDs, will that be an issue? Note: We can add a special case for base58 encoded inlined peer IDs for some migration period but that's not a permanent solution. |
Thanks for the context... very helpful. The main reason I went this route was to avoid having to look up / save a fellow peer's public key as well as its ID within application logic. Also, they're much faster to generate than RSA keys (very noticeable difference on mobile). We have been using IPNS for our profiles. However, for various reasons, we've moved away from that approach. Our next release does not use IPNS. The only place we hardcode peer IDs is for our bootstrap peers. If the solution involves changing peer ID format, we'd have to come up with a migration for our groups, as well as apply the IPFS repo migration. Each peer would announce their new ID. Not the end of the world, but it would require some work... a migration period would be helpful. |
Note: the current proposal involves changing the textual format, not the network/binary format. |
Ah, makes sense. So, we’d just have to run a local migration. Easy enough. Are you thinking this would land sometime next week after your team retreat?
…On Jan 21, 2019, 3:31 PM -0800, Steven Allen , wrote:
Note: the current proposal involves changing the textual format, not the network/binary format.
—
You are receiving this because you commented.
Reply to this email directly, view it on GitHub, or mute the thread.
|
Thinking through this a bit, my proposal doesn't work without a full network (go-ipfs and Textile) upgrade. So, we'll need a migration period. New proposal: Same as above except:
Hopefully by the end of this week. We'll see what other snags we hit. |
What is "some time"? Two releases? 3 Months? How will his affect the connectivity with JS/Rust nodes? Can we have some interop tests written at https://github.com/ipfs/interop before doing the release that test connectivity with different key combinations, namely:
|
@jonnycrunch just adding myself to the thread. with IPID DID method spec i have been publishing the DID document as IPLD to IPNS using ED25519 key (supported), but the actual DID public keys that published are secp256k1. For DID authentication, I was hoping to be as simple as QUIC over libp2p, but this won't be supported with secp256k1. |
Once N% of nodes have upgraded, ideally. We can pull this information from the bootstrappers.
It will restore interoperability with ed25519 between go and js/rust. @jonnycrunch what do your IPNS IDs look like? |
See #138 for a summary. |
@Stebalien go-ipfs version 0.4.18 published to ed25519
|
@Stebalien or using secp256k1 with bip32 HD wallet and hashing using Keccak256 as in ethereum with output as hex: 0x122f0fDcfAaC745381454C32648436bdfe4cBdda |
Got it. |
What: 1. Supports decoding CIDs (of libp2p keys) as peer IDs (libp2p/specs#216) 2. Continues to encode peer IDs as base58 multihashes by default. 3. Adds functions for converting between peer IDs and CIDs. 4. Deprecates IDB58{Decode,Encode}, replacing them with {Decode,Encode}. Why: 1. We _need_ to support multibase somehow, so we can put peer IDs in domains. 2. This makes peer IDs fully self describing. That is, the CID itself indicates that it's a hash of a libp2p public key. 3. It's much easier to upgrade wire protocols than text. This change punts pids-as-cids on the wire down the road but that's something we can revisit if it ever becomes relevant. See libp2p/specs#111 for context. Deviations from that issue: * This _retains_ the current peer ID inlining of ed25519 keys. That turned out to be a nightmare, one I'd like to punt a bit more down the road. * Likewise, this _punts_ the question of embedding the multi-hash algorithm in the public key.
I'm going to briefly revive this issue in order to kill it. I still believe we should support specifying the hash function inside the public key. However, I don't believe we should do it as a mitigation for #138. My original motivation was largely to undo a breaking change I had made. Unfortunately, reverting a breaking change is also a breaking change. When I first proposed this solution, I was under the impression that only OpenBazaar was using Ed25519 keys. Since then, we've learned that:
Therefore, in the interest of causing the least breakage possible, I'm going to:
Resolution: Keep the current peer ID calculation logic. |
What: 1. Supports decoding CIDs (of libp2p keys) as peer IDs (libp2p/specs#216) 2. Continues to encode peer IDs as base58 multihashes by default. 3. Adds functions for converting between peer IDs and CIDs. 4. Deprecates IDB58{Decode,Encode}, replacing them with {Decode,Encode}. Why: 1. We _need_ to support multibase somehow, so we can put peer IDs in domains. 2. This makes peer IDs fully self describing. That is, the CID itself indicates that it's a hash of a libp2p public key. 3. It's much easier to upgrade wire protocols than text. This change punts pids-as-cids on the wire down the road but that's something we can revisit if it ever becomes relevant. See libp2p/specs#111 for context. Deviations from that issue: * This _retains_ the current peer ID inlining of ed25519 keys. That turned out to be a nightmare, one I'd like to punt a bit more down the road. * Likewise, this _punts_ the question of embedding the multi-hash algorithm in the public key.
In the beginning, peer IDs were base58 encoded sha256 multihashes. However, being able to embed a key into a peer ID was rather tempting so we introduced a IDFromEd25519PublicKey function. This function created a different peer ID that embedded the public key.
Unfortunately,
IDFromPublicKey
function, they'd get a different peer ID for the same key. This obviously isn't good as our security transports derive peer IDs from public keys instead of validating peer IDs (this makes it impossible to mis-validate).Therefore, I submitted libp2p/go-libp2p-peer#30 which:
Unfortunately, this last bit appears to be a problem... At the time, I was under the impression that nobody had actually gotten around to using ed25519 keys as inlining didn't really work (libp2p would still use the sha256 peer IDs). However, it turns out that OpenBazaar was using them anyways.
Second, in retrospect, this wasn't even the optimal solution. Really, we also need to support arbitrary hash functions but this only adds support for the identity hash function.
So, my proposal here is to:
Open question: Should we use this as an opportunity to switch to multibase-enabled peer IDs now? One significant issue with the current embedding method is that it makes it slightly harder to switch to multibase peer IDs (cc @lidel). I've previously proposed switching to CID-based peer IDs but that may be more work than it's worth (a motivation would be transfering keys over bitswap, easier to support large quantum-safe keys, format flexibility, etc.).
The text was updated successfully, but these errors were encountered: