-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
typeclass-based collections implementations #4147
Comments
Definitely better for cats-collections. 🙂 This has been a dream for a while, but it's meaningfully hard to do correctly, and it's not clear to me that people would actually use them over the standard library. |
I thought somewhere someone suggested we could basically copy the implementations out of Scala itself. But I am very afraid of the collections hierarchy 😆
yeah, fair ... |
Precedent: |
People definitely won't use what we don't have. I have definitely sunk more time into code with zero users. I think it is worth it, even if it is niche. Someone could also learn a lot about HAMT by doing it, so it is a great learning experience. One last note: cats-collections has two immutable heaps. The standard library has zero. No one generally uses them, but I have and I bet a few others. Safe immutable types are a big part of our mission. Let's not give into "no one will use this" defeatism. They may not, but that is not how we measure success. |
A comment by @durban in typelevel/cats-effect#2464 reminded me of this issue:
Understandably, it is difficult to provide even a So even if a Cats' |
well, if you don't worry to much about performance, here is how: class Hashed[K](implicit hash: Hash[K]) {
class Key(val unwrap: K) {
override val hashCode: Int = hash.hashOf(unwrap)
that match {
case hk: HashKey => hash.eqv(unwrap, hk.unwrap)
case _ => false
}
}
def toMap[V](kvs: Iterable[K, V]): Map[Key, V] =
kvs.iterator.map { case (k, v) => (new Key(k), v) }.toMap
}
val keySpace = new Hashed[Foo]
val myMap: Map[keySpace.Key, String] = keySpace.toMap(List(Foo(1) -> "1")) etc.... so, in other words, you box with something that delegates hashCode and equals to Hash. |
I mean, we could add these methods and inner types to |
I think I am going to have a crack at this one unless someone has already taken it? My plan is to do a pretty conservative port of the HashMap and HashSet from the Scala 2.13 collections initially. e.g. for HashSet. something like this: abstract class HashSet[A] {
final def contains(value: A)(implicit hash: Hash[A]): Boolean
final def add(value: A)(implicit hash: Hash[A]): HashSet[A]
final def remove(value: A)(implicit hash: Hash[A]): HashSet[A]
// etc.
} Essentially, the interface of the existing Scala collections but with the Hash constraint . :) I have not really looked into any kind of caching of hash code values or the many optimizations that have been added to the 2.13.x collections since the initial CHAMP port as I think it's probably best to do the simplest possible implementation to begin with. |
Hmm, shouldn't the |
I am not really sure - I have seen both approaches and I guess it's just as surprising both ways if the |
In general, passing it to each method is superior because it allows call sites to more correctly reflect their constraints. For example, imagine the constructor of |
Ah, ok, thanks both. That makes sense, and I see that cats-collections also uses method-level constraints. 👍 |
On the other hand, if trait MapRef[F[_], K, V] {
def lookup(k: K)(implicit K: Hash[K]): Ref[F, V]
} then we may also need trait OrderMapRef[F[_], K, V] {
def lookup(k: K)(implicit K: Order[K]): Ref[F, V]
} If the 2 constructors receive the |
trait GenMap[C[_], K, V] {
def get(k: K)(implicit C: C[K]): Option[V]
def updated(k: K, v: V)(implicit C: C[K]): GenMap[C, K, V]
def toMap(implicit C: C[K]): Map[K, V] // latches the constraint `C` inside a regular `Map` interface implementation
// can be implemented efficiently as a view of the original `GenMap`
// thus avoiding copying of any data
def toGenMap[CC[_]]: GenMap[CC, K, V] // changes the constraint type
// also should be possible to avoid copying of data
}
type HashMap[K, V] = GenMap[Hash, K, V]
type OrderMap[K, V] = GenMap[Order, K, V] UPD. Disregard please. Written due to a brain fluctuation. Sorry for the disturbance :) |
how can you convert between Not only does the internal structure depend on the kind of map it is (sorted or hash) it depends on the value of the sorting and hashing (of which there can be many due to no coherent typeclasses in scala). I continue to think this approach (each callsite takes the dependency) is actually a haskell cargo-cult and not a great design in scala. The internal structure only makes sense when you use the right value for the typeclass. By not keeping it together you are just hoping that users don't pass the wrong ones. I think binding the typeclass instance to the value like scala does with |
Oh no, disregard please. It won't work, I didn't think through enough) |
Yes, I agree it is a good point. It is just safer to bind internal structure and a constraint together. In fact, we cannot even construct a new instance (besides an empty one) without passing the constraint to the constructor. |
Would it make sense to convert this issue to a "discussion" (the next tab to "Pull requests" in the Github tabs panel)? Because it looks like a discussion rather than an issue :) |
It took me a while to recall where I had seen a proper explanation for this Haskell convention - the best I can find is from Learn You A Haskell:
So it seems that the reasoning in Haskell is not just about enabling appropriately specific constraints to be used for different functions, but that when using data type constraints in Haskell you still have to write out the constraint in every signature that uses that data type, so there is simply no benefit from doing so. The idea that using function-level constraints allows better documentation of the requirements for a particular function makes a lot of sense to me, but it seems much less beneficial in this case since we are talking about adding the same
I think your reasoning is pretty convincing overall @johnynek. The main counter-argument I can think of is code like def contains_(v: A)(implicit ev: Eq[A], F: Foldable[F]): Boolean =
F.exists(fa)(ev.eqv(_, v))
hashSet.contains_(1) // The type class extension method uses the `Eq[Int]` in scope at this callsite
hashSet.contains(1) // The instance method uses the `Eq[Int]` bound in the `hashSet` I don't mean to focus too much on this specific example - just that binding type class instances at class level increases the likelihood that code which uses |
I've pushed a WIP PR for the HashSet implementation under #4185, interested in all of your thoughts about it! I'd like to write a bunch more tests and add some benchmarks for what is there now, but we also need to decide on a few things:
|
Btw, we also have to do something about this law:
See discussion in #4118 (comment). |
I've pushed #4193 containing a hash map implementation along the same lines. Some things I still need to do:
|
should we close this since it was ultimately rejected and pushed to cats-collections? |
Also note since this was opened
|
The HashMap / HashSet portion of this has been contributed to cats-collections now - see typelevel/cats-collections#534 and typelevel/cats-collections#533. I am not sure if I will have an opportunity to work on a |
Do we actually need a specialized |
Yes that's a good point - I suppose the problem is that the |
Eh, I wouldn't really consider that a problem. If you use the cats |
This is a recurring idea, most recently by @johnynek in #4118 (comment).
Although the issue comes up in Cats since this is where we house instances for Scala stdlib, this might be a better PR for cats-collections.
The text was updated successfully, but these errors were encountered: