-
-
Notifications
You must be signed in to change notification settings - Fork 1.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cats.kernel.Hash port for Scala CHAMP HashMap #4193
Conversation
…gations as a derived work
I think if we beef up the tests as I sketched, I am a champion of merging this. I think Map APIs are pretty much settled in scala and we can use the existing ones with the natural generalization of SortedMap, Ordering to HashMap, Hash (which is mostly what was done here modulo some add renaming to updated). I think having a HashMap to work with Hash would be a really nice addition to make Hash actually safe to use, and without it, I Hash is pretty uninteresting (except that other users might internally implement their own hashmaps based on it). |
Can you post the benchmark results as a comment and discuss a comparison to the universal hash equivalents (which I assume is very hard to beat, but hopefully we could nearly match). |
I think we will also need to revoke this law:
|
I tend to agree. We can possibly make two sets of laws: But something is nagging in the back of my mind: Moreover, you could argue that what If we take that view, we can keep the law, but still Hash is useful because it proves you have a reproducible hashcode all the way down, no hiding of general |
Putting aside the fact this wouldn't allow for alternative |
I think The utility of Like, put another way: when would you not want to match I think we may have gotten tripped up and imagined the law means it is 1:1, but it doesn't mean that: all E.g. there would not be a |
Yes, it certainly does have
Sorry, I'm not entirely sure I follow 😕 I guess my point is, if we know that |
You seem to be arguing two contradictory points:
So, going down route 2. seems to limit |
Note that Scala has |
Thanks for the detailed review @johnynek, this is awesome! 🙇 I'm on holiday this week but will get back to this as soon as I can! |
|
||
package cats.data | ||
|
||
private[data] trait HashMapCompat[K, +V] extends IterableOnce[(K, V)] { self: HashMap[K, V] => |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I should highlight this as it's hard to spot - I've extended IterableOnce[(K, V)]
here as it gives us nice interop opportunities with the standard library. Unfortunately TraversableOnce[(K, V)]
on 2.12 requires much more from us in terms of method implementations so I decided not to attempt that.
* Use primitive HashMap operations when generating random HashMaps * Use a better approach to test HashMap key collisions * Remove reverseIterator * Foldable.iterateRight signature changed * HashMap#add renamed to HashMap#updated * HashMap#remove renamed to HashMap#removed * Added HashMap.fromFoldable * Renamed CHAMP node element count fields * Use NonEmptyVector in CHAMP collision node * Fixed various nits
Dumb question, why can't we do this? |
Not a dumb question for sure - we can definitely try adding an |
Oh actually I remember why I didn't do that originally - there's no easy way to thread the |
Haha OK, e7343e8 is a total bust unfortunately 😆 |
I think in order for that approach to be viable we would have to figure out how to skip a bunch more work. |
So, status check on this one :) firstly, round of applause to David for this wonderful piece of work and Oscar for thorough reviews. I optimistically added this one to the v2.8.0 milestone :) besides this PR, I think we're approaching a good place to wrap up 2.8.0 and ship it in the next week or two. Besides the accumulated changes there's been requests to release with the Scala 3 bincompat fix and Scala 3 Native. Release planning: can this PR realistically land in 2.8.0, and should it? Even after this lands, I think there's some follow-up work that could do well to accompany it. Off the top of my head:
For the record, none of these are blockers, but they do go nicely together. Also, some folks were less enthusiastic about adding this to cats-core versus cats-collections. Now that it's taken shape, I'd like to hear if there are (strong) objections (personally I feel 👍 ). Thoughts? Thanks again everyone! |
Yes I think it can - the most recent commit was an attempt to improve the performance of concat but I think we can revert that commit and perhaps I can revive it as a follow-up PR.
I think these are good suggestions but as they are additions perhaps there is no rush to complete them for 2.8.0.
That PR needs some work to add a bunch more of the basic operations - at the moment it's very sparse with only incl / + and excl / -. However it would really be nice to land that as then we could add operations like |
Apologies, I might let this slip to 2.9.0 so we can land 2.8.0 😓 |
@johnynek do you have any thoughts on to proceed with this one? Would you be happy to revert the most recent commit experimenting with the @armanbilge I guess the main blocker to getting this into release is the Hash consistency law? What does deprecating a law look like? |
Actually I'd say the main blocker is reviews: I am very very grateful to all the time and attention to detail you and Oscar put into this PR, since it is an immense, non-trivial contribution. It would be good to hear from a couple other Cats maintainers, especially those who were hesitant about introducing this in cats-core vs cats-collections. I don't think I'll have time to do an in-depth review, but I would like to make a pass-through the public APIs :) Everything else I mentioned including the I am very excited about this contribution :) and I do think the sooner we can release it the better, because it will need additional ecosystem support (e.g. Circe encoders/decoders) to help with adoption. |
I trust your judgement, it seems like we should revert that commit for now and move forward with the PR? We can make an issue to follow-up about this. |
OK, I have reverted that commit now @armanbilge and it seems like CI is happy 🤞 |
Thanks! I will try and carve out time for a review and follow-up with Daniel about his concerns for including this in |
So following up on this: I did ask on Discord and Daniel and Michael chimed in that this is a much better candidate for cats-collections. My own thoughts/summary from that dialog follow. See also #4185 (comment). The main two issues seem to be:
I actually feel okay about (1) since I think this is a straightforward API and I am generally optimistic about the possibility for binary-compatible evolution. (2) is more subjective of course. I was also hoping this will land in Another strategy could be to work towards stable 1.0 of Thoughts? |
It sounds like the general consensus is that this belongs in |
I think so too. Unfortunately I know it's a lot more work but I think the best strategy is to make |
I still think it is better here... I think Map/Set are core, like List and NonEmptyList... these are things we want (e.g. the methods we already have that do groupBy and resort to SortedMap). I think we only need to worry about binary compatibility on the API surface error, and given that Map and Set are very old concepts, I am fairly confident we can make something that works like standard scala ones that we don't need to worry that things will change. |
so... if @armanbilge and I (who actually have spent the time on the code) feel like it should be here, and folks who haven't been reviewing the code and citing specific examples of their concerns in the PR think it shouldn't I don't see why they get the veto. |
I'm here just to support this as much as I can, since |
Superseded by typelevel/cats-collections#534 |
This PR implements an immutable hash map using the CHAMP encoding and
cats.kernel.Hash
for hashing of elements, as per #4147.This accompanies PR #4185 which implements a hash set.
Contributed on behalf of the @opencastsoftware open source team. 👋