-
Notifications
You must be signed in to change notification settings - Fork 13.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HashMap, HashSet: impl Hash #48366
HashMap, HashSet: impl Hash #48366
Conversation
@@ -757,6 +757,16 @@ impl<T, S> Eq for HashSet<T, S> | |||
{ | |||
} | |||
|
|||
#[unstable(feature = "hashmap_hash", issue = "0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impls are insta-stable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm yes, true... maybe it should be #[stable(feature = "hashmap_hash", since = "1.26.0")]
?
@@ -1370,6 +1370,37 @@ impl<K, V, S> Eq for HashMap<K, V, S> | |||
{ | |||
} | |||
|
|||
#[unstable(feature = "hashmap_hash", issue = "0")] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
impls are insta-stable
@@ -1370,6 +1370,37 @@ impl<K, V, S> Eq for HashMap<K, V, S> | |||
{ | |||
} | |||
|
|||
#[unstable(feature = "hashmap_hash", issue = "0")] | |||
impl<K, V, S> Hash for HashMap<K, V, S> | |||
where K: Eq + Hash, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the Eq
bound necessary here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, it is necessary for self.iter()
- you can't call that otherwise. You could use self.table.iter()
which would not require Eq
. So self.iter()
could be changed to not require Eq
. But I think it is right to require K: Eq
since you shouldn't be able to do anything useful with a HashMap<K, V, S>
unless K: Eq
.
src/libstd/collections/hash/map.rs
Outdated
// we might be able do so in the future. | ||
hasher.write_u64( | ||
self.iter() | ||
.map(|(k, v)| { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need to destructure and re-tuple below.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point, fixing that.
src/libstd/collections/hash/map.rs
Outdated
{ | ||
fn hash<H: Hasher>(&self, hasher: &mut H) { | ||
// We must preserve: x == y -> hash(x) == hash(y). | ||
// HashMaps have no order, so we must use a commutative operation |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To be precise, the operation must be associative as well, because in theory you're iterating over arbitrary permutations of the elements. Fortunately, wrapping_add
is both.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'll note that. =)
This hash function seems very weak. This probably can't be avoided, but are we sure we want to add a Hash impl to a standard library type that does not have any of the collision resistance and DoS protection properties Rust has generally tried (best-effort, of course) to provide? |
src/libstd/collections/hash/map.rs
Outdated
// (.wrapping_add) so that the order does not matter. | ||
// HashMaps have no order, so we must combine the elements with an | ||
// associative and commutative operation • so that the order does not | ||
// matter. In other words, (u64, •, 0) must form a commutative monoid. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In theory, it would be sufficient to use a pointed commutative semigroup, as you're not making use of the identity at all here, but I think there's a limit to the usefulness of mentioning such technicalities 😄This looks fine to me!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Haha =P The fold uses the identity element 0 though.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The fold just needs an initial (pointed) element (which doesn't need to be special in any way); you've picked the identity, but in theory you could have chosen any other u64
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh yes, that is true =P /nerdsniped
@rkruppe Hmm.. I agree it is weak.. But I can't think of a better way to do it.. It is always possible to add this impl on a newtype.. But I think it is nice to have the impl for ergonomics.. So for best effort I think we should provide the best we can. |
@rkruppe I believe that can be overcome by implementing supplemental hashing for entries in the collections themselves? We should be doing this regardless, because there's always the risk a user might store something that implements a correct but vulnerable hash. |
@udoprog By supplemental hashing you mean something like what Java's HashMap does? Mangling the key's reported hash further to mix it up a bit? That may improve the distribution but does not help with DoS protection, since each key still has a predictable hash. |
Edit: Ah, ignore me; I misread the method. |
I think DoS protection can be implemented in the collection by incorporating a random salt into the supplemental hash? My wider point is that I don't believe it's tenable to rely even on best-effort |
I also want to emphasize that I'm wildly in favor of this change. This particular |
The supplemental hash can't undo collisions that happened during the actual hash computation.
I don't understand how this is different from what's implemented right now? But regardless, re: "extracting the hash of an individual item" -- using Furthermore, even if that could be avoided, hand wavy arguments based on "it should not be possible to [do specific thing an attacker might do]" are very unconvincing. |
Ah, yeah. Now I get it. |
That was the point of mentioning a future // = () denotes the "I am adding no additional constraints" bound
pub trait Hash<trait Extra = ()> {
fn hash<H>(&self, state: &mut H) where H: Hasher + Bounds;
}
impl<K: Eq + Hash, V: Hash, S: BuildHasher> Hash<Clone> for HashMap<K, V, S> {
fn hash<H: Hasher + Clone>(&self, hasher: &mut H) {
let r =
self.iter()
.map(|kv| { let mut h = hasher.clone(); kv.hash(&mut h); h.finish() })
.fold(0, u64::wrapping_add);
hasher.write_u64(r);
}
} EDIT: This was a bad idea. Now that I've written this... It came to mind that we have access to impl<K, V, S> Hash for HashMap<K, V, S>
where K: Eq + Hash,
V: Hash,
S: BuildHasher
{
fn hash<H: Hasher + Clone>(&self, hasher: &mut H) {
// ...
hasher.write_u64(
self.iter()
.map(|kv| {
let mut h = self.hash_builder.build_hash(); // <--- look ma!
kv.hash(&mut h);
h.finish()
})
.fold(0, u64::wrapping_add)
);
}
} at least then you won't always use @rkruppe, @varkor: what do y'all think? |
@Centril I believe that doesn't work because different HashMaps will generally have different |
@rkruppe Oh yes, that would break |
Thanks for the PR @Centril! I think I personally share similar worries as those here in that this is a relatively weak hash function. You mention though that this can be improved in the future, and I was wondering if we could dig into that a bit? What are our future options for improving this hash function? Are there more complicated methods we know of that can work but don't want to implement just yet? |
So I think two principal improvement methods come to my mind - and they both involve the ability to We could do this in two ways, the first being: pub trait Hasher: Clone { ... } This is backwards incompatible, but potentially doable since I think it would be unusual for a The second, backwards-compatible way is to provide parametric polymorphism of traits/types on bounds, as in the code example above. This requires a very large language change (which I'd argue for on other merits), so it is unlikely to happen in 2018. I haven't written an RFC for that yet, but I probably will. This is all I can think of off the top of my head, but perhaps there are possible improvements to the algorithm itself? |
Ok thanks for the info! It seems though that such features would be a very long ways off which may mean that to decide on this I think we need to acknowledge that it's a weak hash algorithm. If we decide, however, that a weak hash function shouldn't be implemented here then I think we may wish to hold off on this for now. cc @rust-lang/libs, do y'all have thoughts on this? The main caveat here is that the hash algorithm for a hashmap/hashset is relatively weak (just add up all the hashes of all the elements). It seems that improving that may take some time. @Centril out of curiosity, do you know how this is implemented in other languages? |
Note: Not only is the method of combining the entry hashes of dubious strength, the entry hashes themselves are |
@alexcrichton After some quick searching.. This is how Java does it (by summing with
Java seems to not provide any DoS protection. The only difference between Java's implementation and this PR is that Java's xors the key and the value. |
Ok interesting! Do languages like Ruby/JS/Python implement hashing for maps? Or maybe C++? |
@alexcrichton No idea about dynamic languages. With respect to C++, I don't think |
Python doesn't implement Hash for their HashMap ( >>> hash({1:2})
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: unhashable type: 'dict' JavaScript only supports string keys for their Ruby supports hashing a HashMap: irb(main):002:0> {1=>2}.hash
=> -2037178694041332840
irb(main):003:0> {1=>2,3=>4}.hash
=> -958810549599379095
irb(main):004:0> {3=>4,1=>2}.hash
=> -958810549599379095 Its implementation is mainly XOR-ing the hash of all entries together. |
Wrt Python, note that it does not implement hashing for any of its mutable collections (e.g., |
We could do that =) It doesn't do anything against DDoS, but still nice? |
Hashing key and value together as a 2-tuple (i.e. the current implementation |
Yeah it's not a bad idea, and matches what other collection types do. Doesn't do anything against DoS attacks though, since collection length is really easy to predict or extract.
I think Java's choice here is at least in part due to not having a canonical tuple type or other canonical way to combine two hashes. Rust's hashing strategy leaves us many more options. But hash function design is an art and/or science in itself, so I'm wary of making any quick judgements. |
Though perhaps it would be better to use XOR instead of |
Where is @ticki when you need them =P |
Ok so it sort of sounds like our options here are few and far between other than what's currently implemented. @Centril could you remind me of the motivation of this PR? Is it strong enough to push on landing a weak hash function? Or should we perhaps postpone to a later date? |
@alexcrichton I didn't have a particular use case in mind / don't have a personal need for this; iirc this was sparked by an IRC conversation on #rust. But @udoprog said in a comment that:
So maybe they have something specific in mind? |
Ah ok, I'm personally somewhat tempted to close this until a future date where we may either deem the weaker hash ok or have a better solution for the weak hash here. But @udoprog maybe you can expand a bit what you use this for? |
Closing this for now and let's revisit if we have more motivations to do this / or a better hash function. |
@alexcrichton Sorry for the late reply. I've been out travelling for the last week. Nothing apart from the obvious (storing maps in hashtables). I've never used it in a sensitive context, so I generally don't care about DoS protection. One public instance where I actively worked around it can be found here: |
@alexcrichton don't know if I discussed this with you; but did you have a chance to look at @udoprog's use case? |
Hey, I just want to say that that BTreeSet and BTreeMap do implement So in my case, I wanted to put some data, which contained Replacing Just saying, for future beginners like me, coming here. |
Heya! Figured I'd chime in with a use-case (and a reason why it would be quite nice to have this functionality) — with that being said, I'm not sure if this should be an issue I create on the Either way, the use-case is pretty simple! I've got some structs that use a Would love to know if people are still open to implementing something like this — I think it would be nice to have that parity with other languages that have hashable HashMaps. It would save people a lot of surprise (and Googling) down the line, even if it's not something that comes up all of the time. Let me know if there is anything I can do to help out or revive this PR! |
This PR makes
HashMap
s andHashSet
s implementHash
themselves.Since neither of those have any defined order among elements, a commutative operation is used (as well as an independent
Hasher
for each one) to ensure that the order in which they are hashed does not matter.cc rust-lang/rfcs#2190 for improving this in the future.
r? @alexcrichton