-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hash typeclass #1712
Hash typeclass #1712
Conversation
I need some help here: why does the JS version builds but the JVM version failed on serializability? |
I guess it's due to Java serialization not existing on JS. Or at least it isn't tested: https://github.com/typelevel/cats/blob/master/laws/src/main/scala/cats/laws/SerializableLaws.scala#L33. |
@durban So basically it means that the |
@@ -17,7 +17,9 @@ class BigDecimalGroup extends CommutativeGroup[BigDecimal] { | |||
override def remove(x: BigDecimal, y: BigDecimal): BigDecimal = x - y | |||
} | |||
|
|||
class BigDecimalOrder extends Order[BigDecimal] { | |||
class BigDecimalEq extends Order[BigDecimal] with Hash[BigDecimal] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not change the class name to preserve binary compatibility?
@@ -4,8 +4,8 @@ package instances | |||
package object bigDecimal extends BigDecimalInstances // scalastyle:ignore package.object.name | |||
|
|||
trait BigDecimalInstances { | |||
implicit val catsKernelStdOrderForBigDecimal: Order[BigDecimal] = | |||
new BigDecimalOrder | |||
implicit val catsKernelStdEqForBigDecimal: Order[BigDecimal] with Hash[BigDecimal] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not change this name for binary compatibility?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OK I will revert these.
@@ -4,8 +4,8 @@ package instances | |||
package object bigInt extends BigIntInstances // scalastyle:ignore package.object.name | |||
|
|||
trait BigIntInstances { | |||
implicit val catsKernelStdOrderForBigInt: Order[BigInt] = | |||
new BigIntOrder | |||
implicit val catsKernelStdEqForBigInt: Order[BigInt] with Hash[BigInt] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same binary compat issue.
@@ -17,8 +17,9 @@ class BigIntGroup extends CommutativeGroup[BigInt] { | |||
override def remove(x: BigInt, y: BigInt): BigInt = x - y | |||
} | |||
|
|||
class BigIntOrder extends Order[BigInt] { | |||
class BigIntEq extends Order[BigInt] with Hash[BigInt] { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same.
@@ -6,14 +6,16 @@ import scala.collection.immutable.BitSet | |||
package object bitSet extends BitSetInstances | |||
|
|||
trait BitSetInstances { | |||
implicit val catsKernelStdPartialOrderForBitSet: PartialOrder[BitSet] = | |||
new BitSetPartialOrder | |||
implicit val catsKernelStdEqForBitSet: PartialOrder[BitSet] with Hash[BitSet] = |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
please don't change name
case Left(_) => false | ||
case Right(yy) => B.eqv(xx, yy) | ||
} | ||
case Left(xx) => Left(A.hash(xx)).## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we not allocate just to get the hashcode.
Something like Left(xx) => A.hash(xx) * 65521
Right(xx) => B.hash(xx)
seems fine:
mixing the left with a prime about half the size of the int space (taken from the largest 16 bit here):
http://www.risc.jku.at/people/hemmecke/AldorCombinat/combinatse20.html
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I want to have the default Hash instance for Either[A, B]
perform the exact same way as the universal hash (##) given the Hash instance for A
and B
are universal hashes.
So maybe I'll copy the function implementation for case classes in Scala?
Basically I want EitherHash[A, B](DefaultHashInstance[A], DefaultHashInstance[B])(Left(a)) == Left(a).##
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe I'll copy the function implementation for case classes in Scala?
👍
@ctongfei Well, since |
/** | ||
* Returns the hash code of the given object under this hashing scheme. | ||
*/ | ||
def hash(x: A): Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
consider using Long
and move away from the JVM legacy?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@fommil What are the justifications of using Long
? JVM-default Int
works good enough I think?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Object#hashCode: Int
in Java is basically used for distribution within Set
or Map
-like data structures and for that purpose Int
will remain sufficient for some time, for one because the size of an array in Java can't exceed Int.MaxValue
either and if you instantiate an Array<Int>
of size Int.MaxValue
that's around ~ 8 GB of RAM.
So I'm not seeing the size of an array evolving to a Long
, for one because of backwards compatibility, but also it's probably counter productive to allow contiguous memory blocks bigger than that. For now at least.
I'm guessing that's the reason other languages use 32 bit integers (or sometimes less) for their hash codes as well. For example: https://hackage.haskell.org/package/hashable-1.2.6.1/docs/Data-Hashable.html
The questions are:
- besides doing distribution in data structures like
HashMap
/HashSet
, are there any other use cases? - doesn't
Long
have performance implications? it's an important question to answer, as HashMaps tend to be abused - are there any languages using
Long
for their universalhashCode
already and if so, I'd be interested in why?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
64 bit JVMs store both Int
and Long
as 64 bit numbers, so there is no advantage to use Int
for space preserving reasons. However if you want to make use of existing hashCode
algorithms that return Int
then that would be a good reason to keep it that way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think that for now we should stay with what Object#hashCode
is doing: returning an Int
. Agree with @alexandru .
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I disagree about using Int
. If you have a Long
you can easily get an Int
from int, but not the other way around. Some nice algorithms using hashes want more than 32 bits. For instance hyperloglog with only 32 bits would not be great.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
+1 to using Long
here. If you want to just give me an Int
upcast to Long
that's on you, but it would be better to enable better algorithms.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On the whole PR seems fine to me.
trait HashInstances { | ||
implicit val catsFunctorContravariantForHash: Contravariant[Hash] = | ||
new Contravariant[Hash] { | ||
/** Derive an `Hash` for `B` given an `Hash[A]` and a function `B => A`. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small nitpick, I obsess over details, but comment style isn't compatible with either ScalaDoc or JavaDoc. This title needs a new-line.
|
||
def hash(implicit A: Hash[A]): HashProperties = new HashProperties( | ||
name = "hash", | ||
parent = None, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Shouldn't this have EqLaws
as parent?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
OrderLaws's parent is also None. But EqLaws are listed below in the props
. I copied that behavior.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
seems like we should have EqLaws
as a parent.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I couldn't find EqLaws
in the kernel-laws
package.
/** | ||
* Returns the hash code of the given object under this hashing scheme. | ||
*/ | ||
def hash(x: A): Int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Object#hashCode: Int
in Java is basically used for distribution within Set
or Map
-like data structures and for that purpose Int
will remain sufficient for some time, for one because the size of an array in Java can't exceed Int.MaxValue
either and if you instantiate an Array<Int>
of size Int.MaxValue
that's around ~ 8 GB of RAM.
So I'm not seeing the size of an array evolving to a Long
, for one because of backwards compatibility, but also it's probably counter productive to allow contiguous memory blocks bigger than that. For now at least.
I'm guessing that's the reason other languages use 32 bit integers (or sometimes less) for their hash codes as well. For example: https://hackage.haskell.org/package/hashable-1.2.6.1/docs/Data-Hashable.html
The questions are:
- besides doing distribution in data structures like
HashMap
/HashSet
, are there any other use cases? - doesn't
Long
have performance implications? it's an important question to answer, as HashMaps tend to be abused - are there any languages using
Long
for their universalhashCode
already and if so, I'd be interested in why?
case Left(_) => false | ||
case Right(yy) => B.eqv(xx, yy) | ||
} | ||
case Left(xx) => Left(A.hash(xx)).## |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So maybe I'll copy the function implementation for case classes in Scala?
👍
@ctongfei If you move |
Codecov Report
@@ Coverage Diff @@
## master #1712 +/- ##
=========================================
+ Coverage 95.57% 95.6% +0.02%
=========================================
Files 248 252 +4
Lines 4454 4570 +116
Branches 115 117 +2
=========================================
+ Hits 4257 4369 +112
- Misses 197 201 +4
Continue to review full report at Codecov.
|
…s.kernel.instances.CharOrder in trait cats.kernel.instances.CharInstances has a different result type in current version, where it is cats.kernel.Order rather than cats.kernel.instances.CharOrder
It seems that the current version has some binary incompatibility issues with Scala 2.10/2.11 as is indicated by Travis CI:
etc. because these are new methods added to existing traits. |
as of now we haven't decided if we want to break kernel bin compat in 1.0.0. @johnynek thoughts? |
Apart from binary incompatibility issues with JVM Scala 2.10/2.11, I think that this PR is ready to be merged now. There are several lines not covered by tests (I don't think they should): Thanks :-) |
can you add the mima exclusions as was done in #1925 ? |
@johnynek I don't fully understand the rules of these binary checks? Could you elaborate? Thanks! |
@ctongfei in sbt console if you run
you will get a report of the violating classes as well as the filters needed to make them exceptions |
@ctongfei do you want to merge in master? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is great.
When we merge with master, I think we are green.
Thanks for carrying this one so far! I look forward to someone building HashSet
and HashMap
based on this Hash typeclass (not to mention small bloom filters or hll).
👍 |
Thanks everyone :-) |
Thank you @ctongfei ! |
* more instances to Hash: Queue/Duration * more instances to Hash: Queue/Duration
The following modifications are made:
Hash
type class that extendsEq
Hash#on
is not implemented to avoid the diamond inheritance problem. Use the methodHash.by
in the companion object instead.Hash
instances that are compatible with Scala's default are written for nearly all types incats.kernel.instances
:T
, theirHash
instances are directly coded usingT#hashCode
;List
,Stream
etc.), theirHash
instances requireHash
instances on the type of their elements, and theHash
instances on generic types is equivalent to the Scala's defaulthashCode
if theHash
instances on element types are from the universalT#hashCode
;KernelBoiler.scala
file.ForImplemented using universalSet
, it is not implemented yet because there's an issue (SetPartialOrder[A] uses the universal==
instead ofEq
#1710) regarding theOrder
/Eq
instances onSet
s.equals
/hashCode
.cats.kernel.laws
.Contravariant
instance incats-core
forHash
.cats
forHash
.Hash
results in ambiguous implicit resolution forEq
(since bothHash
andOrder
areEq
s). Resolved by introducing precedence of implicits inKernelBoiler.scala
. This file now produces 3 files with different implicit precedences.hash
method on any type that has aHash
instance.