Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove Identity hash from table, add helper function and examples of alternatives #201

Closed
wants to merge 7 commits into from
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions Cargo.toml
Original file line number Diff line number Diff line change
Expand Up @@ -27,6 +27,7 @@ serde-codec = ["serde", "serde-big-array"]

blake2b = ["blake2b_simd"]
blake2s = ["blake2s_simd"]
# identity feature is depricated
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
# identity feature is depricated
# identity feature is deprecated

identity = []
sha1 = ["digest", "sha-1"]
sha2 = ["digest", "sha-2"]
Expand Down
83 changes: 83 additions & 0 deletions examples/identity_hasher.rs
Original file line number Diff line number Diff line change
@@ -0,0 +1,83 @@
use multihash::derive::Multihash;
use multihash::{Error, Hasher, MultihashDigest, MultihashGeneric, Sha2_256};

/// update appends to end of buffer but truncates/ignores bytes after len > S
#[derive(Debug)]
pub struct IdentityTrunk<const S: usize> {
cursor: usize,
arr: [u8; S],
}

impl<const S: usize> Default for IdentityTrunk<S> {
fn default() -> Self {
Self {
cursor: 0,
arr: [0u8; S],
}
}
}
impl<const S: usize> Hasher for IdentityTrunk<S> {
fn update(&mut self, input: &[u8]) {
let src_end = (self.cursor + input.len()).min(self.arr.len());
let input_end = input.len().min(self.arr.len() - self.cursor);

self.arr[self.cursor..src_end].copy_from_slice(&input[..input_end]);
self.cursor = src_end;
}
fn finalize(&mut self) -> &[u8] {
&self.arr[..self.cursor]
}
fn reset(&mut self) {
*self = Self {
cursor: 0,
arr: [0u8; S],
};
}
}

#[derive(Clone, Copy, Debug, Eq, Multihash, PartialEq)]
#[mh(alloc_size = 64)]
pub enum Code {
#[mh(code = 0x00, hasher = IdentityTrunk::<64>)]
IdentityTrunk,
#[mh(code = 0x12, hasher = Sha2_256)]
Sha2_256,
}

fn main() {
// overwrite and trunk with code

let src = b"hello world!";
let ident_trunk = Code::IdentityTrunk.digest(src);

assert_eq!(ident_trunk.digest(), src);

// input bigger than default table, but still const sized
let big_src = b"Lorem ipsum dolor sit amet, consectetur adipiscing elit. Phasellus vehicula tempor magna quis egestas. Etiam quis rhoncus neque.";

let truncated = Code::IdentityTrunk.digest(big_src);
assert_eq!(truncated.digest(), &big_src[..64]);
//
// Helper functions cannot be used with normal table if S != alloc_size, fancier const trait bounds in the future may allow for interop where S <= alloc_size
//

pub const IDENTITY_CODE: u64 = 0x0;
// blind copy of the input array
pub fn identity_hash_arr<const S: usize>(input: &[u8; S]) -> MultihashGeneric<S> {
MultihashGeneric::wrap(IDENTITY_CODE, input).unwrap()
}

// input is truncated to S size
pub fn identity_hash<const S: usize>(input: &[u8]) -> MultihashGeneric<S> {
let mut hasher = IdentityTrunk::<S>::default();
hasher.update(input);
MultihashGeneric::wrap(IDENTITY_CODE, hasher.finalize()).unwrap()
}
Comment on lines +64 to +75
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So which one is the recommended way of doing it? I am not sure it make sense to add an example where the user is again left with two choices.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem we have here is that these are both "valid" technically, the "what to do with too small of a buffer" is entirely based around the requirements of the system you are writing for. Most of the time I would go with identity_hash_arr since that forces users to fit data into a certain size before the library gets it.

These are examples anyway so just using the function defined in in lib.rs will be fine.


// makes use of the const sized input to infer the output size
let big_arr_mh = identity_hash_arr(big_src);
assert_eq!(big_arr_mh.digest(), big_src);
// size must be specified
let big_mh = identity_hash::<128>(big_src);
assert_eq!(big_mh.digest(), big_src);
}
13 changes: 11 additions & 2 deletions src/lib.rs
Original file line number Diff line number Diff line change
Expand Up @@ -11,12 +11,11 @@
//!
//! - `blake2b`: (default) Enable Blake2b hashers
//! - `blake2s`: (default) Enable Blake2s hashers
//! - `identity`: Enable the Identity hashers (using it is discouraged as it's not a hash function
//! in the sense that it produces a fixed sized output independent of the input size)
//! - `sha1`: Enable SHA-1 hasher
//! - `sha2`: (default) Enable SHA-2 hashers
//! - `sha3`: (default) Enable SHA-3 hashers
//! - `strobe`: Enable Strobe hashers
//! - `identity`: A depricated feature for identity hashes.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
//! - `identity`: A depricated feature for identity hashes.
//! - `identity`: A deprecated feature for identity hashes.

//!
//! In order to enable all cryptographically secure hashers, you can set the `secure-hashes`
//! feature flag (enabled by default).
Expand Down Expand Up @@ -92,3 +91,13 @@ pub use crate::hasher_impl::sha3::{Keccak224, Keccak256, Keccak384, Keccak512};
pub use crate::hasher_impl::sha3::{Sha3_224, Sha3_256, Sha3_384, Sha3_512};
#[cfg(feature = "strobe")]
pub use crate::hasher_impl::strobe::{Strobe256, Strobe512, StrobeHasher};

/// Code reserved for Identity "hash"
pub const IDENTITY_CODE: u64 = 0x0;
Comment on lines +95 to +96
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This doesn't necessarily need to be public, can we make it crate-private?

Copy link
Contributor Author

@mriise mriise Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

its still useful imo, e.g. Multihash::wrap(multihash::IDENTITY_CODE, &bytes) or doing a match for it.

Copy link
Contributor

@thomaseizinger thomaseizinger Dec 21, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see that. I am just wary about extending the public API with new items that are not essential. If we want to discourage the use of the identity hasher, it seems reasonable to have users declare that constant by themselves if they actually want to use it.

/// Helper for generating Identity hashes (context https://github.com/multiformats/rust-multihash/pull/196).
/// Will error if `data.len() > S`
///
/// See examples for a few other approaches if this doesn't fit your application
pub fn identity_hash<const S: usize>(data: impl AsRef<[u8]>) -> Result<MultihashGeneric<S>> {
MultihashGeneric::wrap(IDENTITY_CODE, data.as_ref())
}
Comment on lines +101 to +103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to offer this as Multihash::identity instead?

Comment on lines +98 to +103
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems inconsistent to me to provide only this here when we are showing two ways to do it in the example.

Why have this at all? Seems trivial to implement so I am not sure what we are gaining here. At the same time, we are adding an item to our public API which in the future might cause a breaking change.

6 changes: 0 additions & 6 deletions src/multihash_impl.rs
Original file line number Diff line number Diff line change
Expand Up @@ -83,12 +83,6 @@ pub enum Code {
#[cfg(feature = "ripemd")]
#[mh(code = 0x1055, hasher = crate::Ripemd320)]
Ripemd320,

// The following hashes are not cryptographically secure hashes and are not enabled by default
/// Identity hash (max. 64 bytes)
#[cfg(feature = "identity")]
#[mh(code = 0x00, hasher = crate::IdentityHasher::<64>)]
Identity,
Comment on lines -87 to -91
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of directly removing it, may I suggest the use of #[deprecated] to make this a non-breaking change?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's not possible, unfortunately. The footgun is that one might accidentally construct such a hasher, attempt to hash some data, then panic. This won't hit any "deprecated" warnings.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure I understand. If the user uses this code, adding #[deprecated] to it will flag it right?

We can point the user to anything in the deprecation note, including explaining the footgun.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To be clear, I am suggesting to add #[deprecated] to the enum variant.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The foot-gun is Code::try_from(number).digest(...).

  • If the identity feature gets enabled (e.g., by some other crate in the dependency tree).
  • If number happens to be 0.

This will panic.

Copy link
Member

@Stebalien Stebalien Dec 20, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(assuming that the input buffer ... is too long)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, now I get you. Thanks for explaining!

We can still add code that will automatically trigger a deprecation warning once the identity feature gets enabled. I did similar stuff in rust-libp2p.

Can push a PoC in a bit.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

See #260.

}

#[cfg(test)]
Expand Down