-
Notifications
You must be signed in to change notification settings - Fork 288
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Remove the raw
feature and make RawTable
private
#546
Conversation
cc @clarfonthey |
This will give more freedom for the internal implementation details of hashbrown to evolve without the need for regular releases with breaking changes. All existing users of `RawTable` should migrate to the `HashTable` API which is entirely safe while providing the same flexibility as `RawTable`. This also removes the following features which were only exposed under `RawTable`: - `RawTable::iter_hash` - `RawIter::reflect_insert` and `RawIter::reflect_remove` - `RawTable::clone_from_with_hasher` - `RawTable::insert_no_grow` and `RawTable::try_insert_no_grow` - `RawTable::allocation_info` - `RawTable::try_with_capacity(_in)` - `HashMap::raw_table(_mut)` and `HashSet::raw_table(_mut)`
I think https://github.com/Apache/arrow-datafusion uses:
|
Since you're already doing this, would you mind pushing a release after too so rust-lang/rust#128711 can go through? Please and thank you. <3 IMHO the sooner we do this, the better, since it gives people time to migrate away from the API before we start following up on the threat to improve the code. And, since it's a published crate, it's not a big deal, since the old (non-breaking) versions will continue to work correctly. |
Datafusion seem to actually require the full functionality of This isn't something that can be supported on |
We have also been discussing implementing our own custom hash table in apache/datafusion#7095, so perhaps this would be another potential reason to pursue that idea I agree that using the low level details of how the hash table in hashbrown is implemented is not ideal (e.g. it constrains how hashbrown can version releases) @Amanieu I wonder if you would consider a feature flag on hashbrown like |
This will require work in DataFusion to fix TopK aggregations, and then after it is fixed it will cause significant (40% IIRC) performance regressions. |
I don't think that's a good idea because every minor release of hashbrown could break users of your crate. However you can just keep using the 0.14 version of hashbrown which will still have the |
Another option is for us to to fork the crate which might be reasonable if we can get more performance by doing so |
I think we should create some tickets in DataFusion for moving our usage towards |
@bors r+ |
☀️ Test successful - checks-actions |
I was using |
I think |
Add `HashTable::iter_hash`, `HashTable::iter_hash_mut` This is a follow-up to #546 ([comment](#546 (comment))). `iter_hash` from the old raw API can be useful for reading from a "bag" / "multi map" type which allows duplicate key-value pairs. Exposing it safely in `HashTable` takes a fairly small wrapper around `RawIterHash`. This PR partially reverts #546 to restore `RawTable::iter_hash` and its associated types.
This was previously removed from `RawTable` in rust-lang#546. This is now added as a public API on `HashMap`, `HashSet` and `HashTable`.
What specific functionality do you need that isn't available though |
I took a brief look, and the main roadblock that I see will be iterators and entry structs that currently contain |
Added a comment to #545 mentioning to look into dashmap's internals when I go about ripping out the raw table API. It may be the case that we need to offer some kind of lifetime-erased version of the various types to get it to work, but I'm hoping we can get around that. |
I thought about this a bit and fundamentally it's not a problem with |
This will give more freedom for the internal implementation details of hashbrown to evolve without the need for regular releases with breaking changes.
All existing users of
RawTable
should migrate to theHashTable
API which is entirely safe while providing the same flexibility asRawTable
.This also removes the following features which were only exposed under
RawTable
:RawTable::iter_hash
RawIter::reflect_insert
andRawIter::reflect_remove
RawTable::clone_from_with_hasher
RawTable::insert_no_grow
andRawTable::try_insert_no_grow
RawTable::allocation_info
RawTable::try_with_capacity(_in)
HashMap::raw_table(_mut)
andHashSet::raw_table(_mut)
If anyone was previously relying on this functionaly, please raise a comment. It may be possible to re-introduce it as a safe API in
HashTable
and/orHashMap
.