Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Identity hash from table, add helper function and examples of alternatives #201

Closed
wants to merge 7 commits into from

Conversation

mriise
Copy link
Contributor

@mriise mriise commented Apr 1, 2022

related to #196

closes #194

In order to allow for easier transition for downstream users, this adds a few ideas how to implement the identity hasher.

Undecided yet is about what to do with non-const sized identity hashers. Some users may be ok with allocating in order to have larger identity hashes. This is a fairly far edge-case which can be covered, but is debatable whether it should actually be actively supported by us.
(update): We won't be supporting alloc-dependent hashes in this crate for now unless significant desire exists to add them.

Another thing introduced is the bytes crate which can do fun zero-copy things. This part was mostly for me to become more familiar with the crate.

@mriise mriise requested review from Stebalien and vmx April 1, 2022 23:30
Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this nice set of examples.

Would you then suggest that the upgrade path for users of the identity feature is, that they should use their own codec table?

examples/identity_hasher.rs Outdated Show resolved Hide resolved
@mriise
Copy link
Contributor Author

mriise commented Apr 5, 2022

Would you then suggest that the upgrade path for users of the identity feature is, that they should use their own codec table?

Yes, having a default can only cover one or maybe two use cases when it would be better for lib users to decide how they want to handle it. Adding a helper function might be helpful, but again it depends on how they want to handle excess data and should be implemented by the user.

@mriise mriise marked this pull request as ready for review August 11, 2022 06:42
@mriise mriise requested a review from vmx August 11, 2022 06:42
@mriise mriise changed the title add alternative identity examples Remove Identity hash from table, add helper function and examples of alternatives Aug 11, 2022
Copy link
Member

@vmx vmx left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the modules top-level documentation. It still mentions the identity hash. There should be some information about the identity hash on how to use it if you need it.

examples/identity_hasher.rs Outdated Show resolved Hide resolved
@vmx
Copy link
Member

vmx commented Aug 11, 2022

@mxinden I'd like to get your approval before this one gets merged. As libp2p is one of the users of the identity hash.

@mxinden
Copy link
Member

mxinden commented Aug 12, 2022

Thanks for the ping @vmx. I am not quite sure I understand the consequences of this change for rust-libp2p.

Do I understand correctly that we would need to special-case the identity hash parsing from a [u8] here?

https://github.com/libp2p/rust-libp2p/blob/a4110a2b6939d5b460f11df6dd158308f517becf/core/src/peer_id.rs#L71-L75

@vmx
Copy link
Member

vmx commented Aug 12, 2022

Do I understand correctly that we would need to special-case the identity hash parsing from a [u8] here?

Yes that's correct. You'd also need to special case here:
https://github.com/libp2p/rust-libp2p/blob/a4110a2b6939d5b460f11df6dd158308f517becf/core/src/peer_id.rs#L60-L66

Alternatively you could implement your own Code enum, which only includes SHA2-256 and the identity hash (which is also what the new example from this PR does).

@mxinden
Copy link
Member

mxinden commented Aug 13, 2022

That sounds reasonable to me. Thanks @vmx.

@BigLep
Copy link

BigLep commented Oct 4, 2022

@mriise : are you able to incorporate comments here so this can be merged?

@BigLep
Copy link

BigLep commented Nov 22, 2022

@mriise : are you able to incorporate comments here, or should we close?

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

+1 for removing footguns.

Comment on lines -87 to -91
// The following hashes are not cryptographically secure hashes and are not enabled by default
/// Identity hash (max. 64 bytes)
#[cfg(feature = "identity")]
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)]
Identity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of directly removing it, may I suggest the use of #[deprecated] to make this a non-breaking change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not possible, unfortunately. The footgun is that one might accidentally construct such a hasher, attempt to hash some data, then panic. This won't hit any "deprecated" warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. If the user uses this code, adding #[deprecated] to it will flag it right?

We can point the user to anything in the deprecation note, including explaining the footgun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I am suggesting to add #[deprecated] to the enum variant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The foot-gun is Code::try_from(number).digest(...).

  • If the identity feature gets enabled (e.g., by some other crate in the dependency tree).
  • If number happens to be 0.

This will panic.

Copy link
Member

@Stebalien Stebalien Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming that the input buffer ... is too long)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now I get you. Thanks for explaining!

We can still add code that will automatically trigger a deprecation warning once the identity feature gets enabled. I did similar stuff in rust-libp2p.

Can push a PoC in a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #260.

Comment on lines +96 to +97
/// Code reserved for Identity "hash"
pub const IDENTITY_CODE: u64 = 0x0;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't necessarily need to be public, can we make it crate-private?

Copy link
Contributor Author

@mriise mriise Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its still useful imo, e.g. Multihash::wrap(multihash::IDENTITY_CODE, &bytes) or doing a match for it.

Copy link
Contributor

@thomaseizinger thomaseizinger Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that. I am just wary about extending the public API with new items that are not essential. If we want to discourage the use of the identity hasher, it seems reasonable to have users declare that constant by themselves if they actually want to use it.

Comment on lines +102 to +104
pub fn identity_hash<const S: usize>(data: impl AsRef<[u8]>) -> Result<MultihashGeneric<S>> {
MultihashGeneric::wrap(IDENTITY_CODE, data.as_ref())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to offer this as Multihash::identity instead?

Comment on lines +61 to +72
pub const IDENTITY_CODE: u64 = 0x0;
// blind copy of the input array
pub fn identity_hash_arr<const S: usize>(input: &[u8; S]) -> MultihashGeneric<S> {
MultihashGeneric::wrap(IDENTITY_CODE, input).unwrap()
}

// input is truncated to S size
pub fn identity_hash<const S: usize>(input: &[u8]) -> MultihashGeneric<S> {
let mut hasher = IdentityTrunk::<S>::default();
hasher.update(input);
MultihashGeneric::wrap(IDENTITY_CODE, hasher.finalize()).unwrap()
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So which one is the recommended way of doing it? I am not sure it make sense to add an example where the user is again left with two choices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem we have here is that these are both "valid" technically, the "what to do with too small of a buffer" is entirely based around the requirements of the system you are writing for. Most of the time I would go with identity_hash_arr since that forces users to fit data into a certain size before the library gets it.

These are examples anyway so just using the function defined in in lib.rs will be fine.

@@ -27,6 +27,7 @@ serde-codec = ["serde", "serde-big-array"]

blake2b = ["blake2b_simd"]
blake2s = ["blake2s_simd"]
# identity feature is depricated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# identity feature is depricated
# identity feature is deprecated

//! - `sha1`: Enable SHA-1 hasher
//! - `sha2`: (default) Enable SHA-2 hashers
//! - `sha3`: (default) Enable SHA-3 hashers
//! - `strobe`: Enable Strobe hashers
//! - `identity`: A depricated feature for identity hashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! - `identity`: A depricated feature for identity hashes.
//! - `identity`: A deprecated feature for identity hashes.

Comment on lines -87 to -91
// The following hashes are not cryptographically secure hashes and are not enabled by default
/// Identity hash (max. 64 bytes)
#[cfg(feature = "identity")]
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)]
Identity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. If the user uses this code, adding #[deprecated] to it will flag it right?

We can point the user to anything in the deprecation note, including explaining the footgun.

Comment on lines +98 to +103
/// Will error if `data.len() > S`
///
/// See examples for a few other approaches if this doesn't fit your application
pub fn identity_hash<const S: usize>(data: impl AsRef<[u8]>) -> Result<MultihashGeneric<S>> {
MultihashGeneric::wrap(IDENTITY_CODE, data.as_ref())
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems inconsistent to me to provide only this here when we are showing two ways to do it in the example.

Why have this at all? Seems trivial to implement so I am not sure what we are gaining here. At the same time, we are adding an item to our public API which in the future might cause a breaking change.

Comment on lines -87 to -91
// The following hashes are not cryptographically secure hashes and are not enabled by default
/// Identity hash (max. 64 bytes)
#[cfg(feature = "identity")]
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)]
Identity,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I am suggesting to add #[deprecated] to the enum variant.

Copy link
Contributor

@thomaseizinger thomaseizinger left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For the sake of making this a non-breaking change and working towards minimizing them in the future, can I ask that we:

  • Not remove the Identity variant, but add #[deprecated] to it explicitly.
  • Not add any new public API items. What we add here seems trivial to implement outside of the crate. It is my understanding that the IdentityHasher per se is not bad, users just need to be careful when using it. Having them write a bit more code seems like a code way of getting them to think what they are doing there.
  • Apply Issue deprecation warning on Code if identity feature is enabled #260 to warn users about the footgun in combination with the identity feature.

Thanks for considering :)

@vmx
Copy link
Member

vmx commented Dec 21, 2022

  • Not add any new public API items.

I think that API comes from my request, when I said there needs to be a proper replacement for people that currently use it. I wanted a clean upgrade path. Though it can also be a well defined function in the (now existing ;) changelog, it doesn't have to be a public API (especially if we move the code table to a separate crate as it currently is discussed at #259).

@thomaseizinger
Copy link
Contributor

  • Not add any new public API items.

I think that API comes from my request, when I said there needs to be a proper replacement for people that currently use it. I wanted a clean upgrade path. Though it can also be a well defined function in the (now existing ;) changelog, it doesn't have to be a public API (especially if we move the code table to a separate crate as it currently is discussed at #259).

I think a recommendation on how to deal with the identity hash is a good idea. I have no strong opinion whether it is an example, rustdoc comment, test, changelog entry or something else :)

@thomaseizinger
Copy link
Contributor

There is #289 now which is up-to-date with latest master.

@vmx
Copy link
Member

vmx commented May 31, 2023

As #289 was merged, I'll close this one.

@vmx vmx closed this May 31, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Remove Code::Identity support
6 participants