Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Improve Hasher and HMAC APIs #141

Merged
merged 2 commits into from
Jan 28, 2015
Merged

Improve Hasher and HMAC APIs #141

merged 2 commits into from
Jan 28, 2015

Conversation

gkoz
Copy link
Contributor

@gkoz gkoz commented Jan 12, 2015

The reuse API I implemented for Hasher the last time proved to be very hard to use. Here's a better approach that relies on the borrow checker by using a mut reference. This time it's both Hasher and HMAC.

  • Add a "one stop" compute() method for the simple use case.
  • Split out update()/finalize() API into the Accumulator struct.
  • Rely on borrow checker to enforce openssl call order contract (init, update+, finalize).
  • Track the lifecycle explicitly as an optimization in HMAC case and to workaround Rust 1.0's unsafe destructor limitation (see Misleading error with Drop and lifetime parameters rust-lang/rust#17858 (comment)).
  • Unify Hasher and HMAC implementation details for consistency.
  • Implement Writer for Accumulator.
  • Add examples.
  • Deprecate standalone hash() in favor of Hasher::compute().

Note that Accumulator now has very similar methods update() and write(). It could make sense to drop update(), although write() requires checking the return value...

@gkoz
Copy link
Contributor Author

gkoz commented Jan 12, 2015

Hmm, also thinking of renaming update() to push() because the name doesn't seem to make much sense in its new context.

@sfackler
Copy link
Owner

Sorry for the delay on getting to this. Could this be adjusted to implement libstd's redesigned Hasher trait? http://doc.rust-lang.org/std/hash/trait.Hasher.html

@gkoz
Copy link
Contributor Author

gkoz commented Jan 20, 2015

I'll look into it.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 20, 2015

The way Hash is implemented for &str and &[T] makes me question this idea: it mixes extra data into the input so the Hasher results would be incompatible with everyone else.

impl<S: Writer + Hasher> Hash<S> for str {
    #[inline]
    fn hash(&self, state: &mut S) {
        state.write(self.as_bytes());
        0xffu8.hash(state)
    }
}

impl<S: Writer + Hasher, T: Hash<S>> Hash<S> for [T] {
    #[inline]
    fn hash(&self, state: &mut S) {
        self.len().hash(state);
        for elt in self.iter() {
            elt.hash(state);
        }
    }
}

EDIT: Unless we can specialize the Hash impls to do the right thing maybe... But we don't seem to be allowed that.

src/main.rs:45:1: 48:2 error: the type `str` does not reference any types defined in this crate; only traits defined in the current crate can be implemented for arbitrary types [E0117]
src/main.rs:45 impl hash::Hash<MD> for str {
src/main.rs:46     fn hash(&self, state: &mut MD) {
src/main.rs:47     }
src/main.rs:48 }

@sfackler
Copy link
Owner

What do you mean by "incompatible with everyone else"? Hash functions typically act on byte streams, not strings. If you want to pass the underlying byte slice of a str to a Hasher, you can call write with it.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 21, 2015

I'm missing the reason to implement it then.

I assumed that the purpose of std::hash API is to allow hashing various types directly instead of calling write (which the Hasher supports anyway). So it would be reasonable to do [we know strings are always utf8 so why not hash them too]

let md = std::hash::hash::<_, openssl::crypto::hash::Hasher>(str_or_byte_slice);

but the result would be incorrect. Not to mention the Hash for [T] impl calling hash on each element of the slice.
So if this trait is implemented we need to warn the users: "You probably want to hash some byte slices? Don't use the trait, call write!" It just doesn't feel like a good design.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 21, 2015

Here's the Hasher almost implementing the Hasher trait. It won't build because the trait has finish(&self) instead of finish(&mut self).
My objection against implementing the trait is somewhat alleviated by the fact that std::hash::hash() only takes hashers that implement Default so it can't be used with this Hasher.
So the only problem is the finalize signature.
https://github.com/gkoz/rust-openssl/blob/f4bafa95f90d5bb377c5b65de4a75642bfd8466c/src/crypto/hash.rs

@sfackler
Copy link
Owner

We can get around the &mut self/&self restrictions since we're only passing raw pointers around. We can grab const pointers and then cast to a mut pointer.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 21, 2015

There's nothing preventing users from invoking finish() many times so it seems to me the Hasher needs to have some state and mutate it on finish. I suspect this signature is an artifact of Siphasher getting away with not changing state on finish. Not so with openssl obviously.

@sfackler
Copy link
Owner

Hmm, yeah that's true. In that case, I think the most reasonable thing to do is to not implement Hasher, but just define the closest to matching API. Does that make sense?

The Hasher trait's still unstable. I'll file an issue on the Rust repo about possible adjustments to the trait definition.

Thanks!

@gkoz
Copy link
Contributor Author

gkoz commented Jan 21, 2015

Sounds like a plan!

@gkoz
Copy link
Contributor Author

gkoz commented Jan 21, 2015

It should be noted that the Hasher trait doesn't use Result so we'd have to panic on errors. However if hash::Writer is to be replaced with io::Writer which does return Result, Hasher might be adjusted similarly.

@sfackler
Copy link
Owner

I'd have to check the OpenSSL source (shudder), but my guess is that EVP_DigestUpdate &co only return errors for use errors, like calling update after finish. We should be able to statically prevent that from happening so it'd be fine to panic if an error is returned from OpenSSL.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

Isn't it ridiculous though that eventually one would have to put

use std::io::Writer;
use std::hash::Hasher;
use openssl::crypto::hash::Hasher;

in order to call h.write() and h.finish() both actually implemented in the latter module?

Could this be a shortcoming in Rust? Maybe implementing a trait shouldn't unconditionally hide the methods that are essential to the struct. What if there was something like

trait T { pub fn smth(); };
struct S;
impl S {
    pub fn smth() { /* do smth */ }
}
impl T for S {
    #[take_from_struct]
    fn smth();
}

I bet there's a discussion or an RFC about it somewhere.

@sfackler
Copy link
Owner

@gankro and I have talked about the addition of "inherent" implementations of traits, where the methods on that trait would always be callable on the type without having to import it. I expect an RFC should be landing about it from someone shortly after 1.0 releases.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

That's good to know :)

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

This should do.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

Scratch that.
So the std::hash API is fundamentally incompatible with openssl or similar bindings.
The finish is no finish, it doesn't reset and supports interspersing write and finish. i.e. you can append more data and hash the whole again.
http://is.gd/EvwW40
And it's not mentioned explicitly, the only clue is the lack of mut in fn finish(&self).
In openssl we have no choice than to reset when finish is called.
EDIT: sorry for a somewhat rude tone...

@Gankra
Copy link

Gankra commented Jan 22, 2015

@alexcrichton may be interested in this information

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

Python works around this limitation by providing copy that probably translates to EVP_MD_CTX_copy_ex():

hash.copy()
Return a copy (“clone”) of the hash object. This can be used to efficiently compute the digests of strings that share a common initial substring.

Not so lucky with HMAC_* There's a HMAC_CTX_copy too though it's not documented.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

So the incompatibility might not be a fundamental one after all... ducks

@gkoz
Copy link
Contributor Author

gkoz commented Jan 22, 2015

The latest version drops reset, makes it clear that finish implies reset and implements Clone.

@sfackler
Copy link
Owner

This seems good to me. Do you think this is ready to go?

@gkoz
Copy link
Contributor Author

gkoz commented Jan 23, 2015

For a more consistent experience finish now resets the hasher so the next finish will effectively return hash(&[]).
This should be it.
The next PR is going to propose renaming HashType to Type breaking these modules' API again.

@gkoz
Copy link
Contributor Author

gkoz commented Jan 23, 2015

I've just remembered a slight issue with using asserts to check the return values.
If an ffi function fails, related functions might do too and we're calling the finalizer from drop so a "panic while panicking" case is possible. This (or just a single assert for that matter) could cause failures to clean up sensitive data. Not sure how to deal with it best...

@gkoz
Copy link
Contributor Author

gkoz commented Jan 23, 2015

There, drop avoids asserts now.

@sfackler
Copy link
Owner

Cool. I think I'll merge this after the big io -> old_io change lands to avoid having to bump major versions twice in a row: rust-lang/rust#21543

@gkoz
Copy link
Contributor Author

gkoz commented Jan 24, 2015

Okay. I'm adding the s/HashType/Type/ commit to this PR for your consideration then.

@sfackler
Copy link
Owner

👍

- Implement Clone and std::io::Writer.
- Reduce the API to write() and finish(). Contrary to std::hash, finish() resets the hasher immediately.
- Add hmac::hmac() convenience fn.
- Replace hash::evpmd() with HashType methods.
- Add assertions as a crude check for failed calls into openssl.
- Add examples and some tests.

[breaking-change]
s/HashType/Type/ to follow the current Rust style. Import Type as HashType in modules where the name might be ambiguous.
[breaking change]
@gkoz
Copy link
Contributor Author

gkoz commented Jan 28, 2015

Rebased.

@sfackler
Copy link
Owner

Thanks!

sfackler added a commit that referenced this pull request Jan 28, 2015
Improve Hasher and HMAC APIs
@sfackler sfackler merged commit 8b47daa into sfackler:master Jan 28, 2015
@gkoz gkoz deleted the borrow_mut branch February 1, 2015 21:51
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants