Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add Hashable typeclass #188

Open
fehrenbach opened this issue Oct 7, 2018 · 10 comments
Open

Add Hashable typeclass #188

fehrenbach opened this issue Oct 7, 2018 · 10 comments
Labels
status: blocked This issue or PR is blocked by something and cannot make progress.

Comments

@fehrenbach
Copy link

I would like to add a Hashable typeclass to the prelude and deriving support for it to the compiler. It is intended as an approximation of equality for use in hash-based algorithms and data structures, such as hash maps, not cryptographically secure digests or password hashes.

class Eq a <= Hashable a where
  hash :: a -> Int

There are two reasons for adding it to the prelude instead of having a standalone library:

  1. Dependencies: To be useful in general data structures, as many datatypes as possible should be hashable. Ideally, whenever a datatype is Eq, it should also have Ord and Hashable instances, if possible. Most packages depend on the prelude already.[citation needed] This should make it easier to convince authors to add Hashable instances, because they do not need to add another dependency, all that is required is one more deriving line below the lines for Eq and Ord.
  2. Performance: Usually, the main goal of hashing is to avoid expensive comparisons. One could write a generic hash function in terms of purescript-generics-rep. This would be fine for many applications, but for hash-based data structures etc., the hash function should really be as fast as possible.

I am more than happy to make an initial pull request (based on https://github.com/fehrenbach/purescript-unordered-collections/blob/master/src/Data/Hashable.purs) and work on deriving support.

What do people think?

@kritzcreek
Copy link
Member

I'd definitely like to have this 👍

@fehrenbach
Copy link
Author

Great.

I think the only major design decision is whether to newtype Hash = Hash Int or not.

@garyb
Copy link
Member

garyb commented Oct 11, 2018

I'd like the newtype personally, as the name at least gives a bit of a semantic clue to the value's intention.

@kritzcreek
Copy link
Member

Since this is primarily done for performance reasons I'd like to make sure the newtype doesn't incur any annoying penalties, obviously newtype means the underlying representation is an Int, but if we find ourselves mapping the newtype constructor a lot we'll end up with identity calls that the compiler currently doesn't optimize.

@fehrenbach
Copy link
Author

Also, equality comparisons on Int are special-cased by the compiler to emit ===. This is not currently done for (newtype) derived instances for newtypes over Int.

newtype Hash = Hash Int
derive newtype {- or not -} instance eqHash :: Eq Hash

approxEq :: Hash -> Hash -> Boolean
approxEq a b = a == b -- ... return Data_Eq.eq(eqHash)(a)(b);

-- can be avoided by being careful
approxEq' :: Hash -> Hash -> Boolean
approxEq' (Hash a) (Hash b) = a == b -- ... return a === b;

I guess just not providing the Eq instance would force the second version, but would be a bit annoying.
On the other hand, one more dictionary lookup and two applications is probably not the world, and writing performance critical code is annoying anyway.

I don't have a strong opinion either way.

@fehrenbach
Copy link
Author

We could even index the Hash newtype by the type it's a hash of:

newtype Hash a = Hash Int

class Eq a <= Hashable a where
  hash :: a -> Hash a

I think this almost makes sense because there is rarely a reason to do anything with hash values from different source types. On the other hand, I think I'd rather avoid providing an Eq Hash instance, at least at the moment, so it hardly makes a difference.

@garyb
Copy link
Member

garyb commented Oct 23, 2018

☝️ I like that (type-indexing the Hash).

I don't really know enough to give an opinion on whether the lack of inlining for the instances is a noticeable expense or not though - you're probably best placed to determine that @fehrenbach, since you've done all the work with hashed values in -unordered-collections 😄

@kritzcreek
Copy link
Member

kritzcreek commented Oct 23, 2018

While we're here, do we think it would be a good idea to derive Hashable in the compiler like we do for Eq and Ord? That way it's even less likely that people have disagreeing Eq and Hashable instances.

@fehrenbach
Copy link
Author

@garyb I'll go for the type-indexed newtype and the usual classes. It's probably not a big performance problem and if it turns out to be, unwrapping is an easy workaround or we could just inline Eq for Hash too.

@kritzcreek Yes, I think so. This is a big part of the motivation for having Hashable in the prelude. The original issue text was meant to say that, but there's only a hint left. I must have accidentally removed it while editing... I already have a branch that derives Hashable, but without the newtype. It will take some work to change it to wrap and unwrap as necessary.

@fehrenbach
Copy link
Author

@hdgarrood points out that adding something like hashWithSalt to the class might be better (and providing hash as a convenience function).
fehrenbach/purescript-unordered-collections#8 (comment)

@JordanMartinez JordanMartinez added the status: blocked This issue or PR is blocked by something and cannot make progress. label Dec 1, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
status: blocked This issue or PR is blocked by something and cannot make progress.
Projects
None yet
Development

No branches or pull requests

4 participants