Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tracking issue for HashMap::raw_entry #56167

Closed
sfackler opened this issue Nov 22, 2018 · 52 comments · Fixed by #138425
Closed

Tracking issue for HashMap::raw_entry #56167

sfackler opened this issue Nov 22, 2018 · 52 comments · Fixed by #138425
Labels
A-collections Area: `std::collections` B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Tracked Libs issues that are tracked on the team's project board. S-tracking-design-concerns Status: There are blocking design concerns. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.

Comments

@sfackler
Copy link
Member

sfackler commented Nov 22, 2018

Added in #54043.


As of 6ecad33 / 2019-01-09, this feature covers:

impl<K, V, S> HashMap<K, V, S>
    where K: Eq + Hash,
          S: BuildHasher
{
    pub fn raw_entry(&self) -> RawEntryBuilder<K, V, S> {}
    pub fn raw_entry_mut(&mut self) -> RawEntryBuilderMut<K, V, S> {}
}

pub struct RawEntryBuilder<'a, K: 'a, V: 'a, S: 'a> {} // Methods return Option<(&'a K, &'a V)>
pub struct RawEntryBuilderMut<'a, K: 'a, V: 'a, S: 'a> {} // Methods return RawEntryMut<'a, K, V, S>
pub enum RawEntryMut<'a, K: 'a, V: 'a, S: 'a> {
    Occupied(RawOccupiedEntryMut<'a, K, V>),
    Vacant(RawVacantEntryMut<'a, K, V, S>),
}
pub struct RawOccupiedEntryMut<'a, K: 'a, V: 'a> {}
pub struct RawVacantEntryMut<'a, K: 'a, V: 'a, S: 'a> {}

… as well as Debug impls for each 5 new types, and their inherent methods.

@sfackler sfackler added T-libs-api Relevant to the library API team, which will review and decide on the PR/issue. B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC labels Nov 22, 2018
@Amanieu
Copy link
Member

Amanieu commented Nov 26, 2018

What is the motivation for having separate from_hash and search_bucket methods? It seems that the only difference is whether the hash value is checked before calling is_match. However if the table does not store full hashes (i.e. hashbrown) then there is no difference between these methods.

Could we consider merging these methods into a single one? Or is there some use case where the difference in behavior is useful?

@Gankra
Copy link
Contributor

Gankra commented Nov 27, 2018

I am also extremely confused by this distinction, as my original designs didn't include them (I think?) and the documentation that was written is very unclear.

@Amanieu
Copy link
Member

Amanieu commented Nov 27, 2018

cc @fintelia

@fintelia
Copy link
Contributor

fintelia commented Nov 27, 2018

The reason I added search_bucket was because I wanted to be able to delete a random element from a HashMap in O(1) time, without storing an extra copy of all the keys. Basically, instead of doing something like this:

let key = map.iter().nth(rand() % map.len()).0.clone();
map.remove(&key);

I wanted to just be able to pick a random "bucket" and then get an entry/raw entry to the first element in it if any:

loop {
    if let Occupied(o) = map.raw_entry_mut().search_bucket(rand(), || true) {
        o.remove();
        break;
    }
}

(the probabilities aren't uniform in the second version, but close enough for my purposes)

@Gankra
Copy link
Contributor

Gankra commented Nov 28, 2018

I continue to not want to support the "random deletion" usecase in std's HashMap. You really, really, really, should be using a linked hashmap or otherwise ordered map for that.

@Amanieu
Copy link
Member

Amanieu commented Dec 9, 2018

I have removed this method in the hashbrown PR (#56241). Your code snippet for random deletion won't work with hashbrown anyways since it always checks the hash as part of the search process.

Amanieu added a commit to Amanieu/rust that referenced this issue Dec 11, 2018
It doesn't work in hashbrown anyways (see rust-lang#56167)
@gdzx
Copy link

gdzx commented Mar 1, 2019

I can avoid unnecessary clones inherent to the original entry API which is nice. But unless I'm mistaken, the current raw_entry API seems to hash the keys twice in this simple use case:

#![feature(hash_raw_entry)]

use std::collections::HashMap;

let mut map = HashMap::new();

map.raw_entry_mut()
   .from_key("poneyland")
   .or_insert("poneyland", 3);

Currently I use the following function to hash once and automatically provide an owned key if necessary (somewhat similar to what was discussed in rust-lang/rfcs#1769):

use std::borrow::Borrow;
use std::collections::hash_map::RawEntryMut;
use std::hash::{BuildHasher, Hash, Hasher};

fn get_mut_or_insert_with<'a, K, V, Q, F>(
    map: &'a mut HashMap<K, V>,
    key: &Q,
    default: F,
) -> &'a mut V
where
    K: Eq + Hash + Borrow<Q>,
    Q: Eq + Hash + ToOwned<Owned = K>,
    F: FnOnce() -> V,
{
    let mut hasher = map.hasher().build_hasher();
    key.hash(&mut hasher);
    let hash = hasher.finish();

    match map.raw_entry_mut().from_key_hashed_nocheck(hash, key) {
        RawEntryMut::Occupied(entry) => entry.into_mut(),
        RawEntryMut::Vacant(entry) => {
            entry
                .insert_hashed_nocheck(hash, key.to_owned(), default())
                .1
        }
    }
}

Given k1 and k2 with the same type K such that hash(k1) != hash(k2), is there a use-case for calling RawEntryBuilderMut::from_key_hashed_nocheck with hash(k1), &k1 and then inserting with RawVacantEntry::or_insert using k2 ?

If there isn't, why not saving the hash in RawVacantEntryMut and using it inside RawVacantEntryMut::insert ? It would even be possible to assert in debug builds that the owned key has indeed the same hash as the borrowed key used to lookup the entry.

@timvermeulen
Copy link
Contributor

I'm not yet very familiar with this API, but what @gdouezangrard suggested seems like a great idea to me. What even happens currently if the two hashes don't match, is the key-value pair then inserted into the wrong bucket? It's not clear to me from (quickly) reading the source code.

@sujayakar
Copy link

I submitted rust-lang/hashbrown#54 to support using a K that doesn't implement Hash via the raw entry API. See rust-lang/hashbrown#44 for the original motivation. Now that hashbrown is merged into std, could we expose this functionality on the std::collections::hash_map types as well?

If so, I'd be happy to submit a PR!

@thomcc
Copy link
Member

thomcc commented Apr 11, 2020

This is a really great API, it's also what keeps crates (hashlink for example) using hashbrown instead of the stdlib hash map -- since hashbrown exposes this.

What could be next steps here towards stabilization?

@KodrAus KodrAus added I-nominated Libs-Tracked Libs issues that are tracked on the team's project board. labels Jul 29, 2020
@sanbox-irl
Copy link

Just gonna add another ping here -- what's blocking this right now?

@Amanieu
Copy link
Member

Amanieu commented Nov 12, 2020

I see a few things that need to be resolved:

I would recommend prototyping in the hashbrown crate first, which can then be ported back in the the std HashMap.

@KamilaBorowska
Copy link
Contributor

KamilaBorowska commented Feb 4, 2021

I find raw_entry and raw_entry_mut methods unnecessary - unlike entry method, they don't take any parameters, they just provide access to methods that could as well be in HashMap itself. I think I would consider getting rid of those and putting raw entry APIs directly in HashMap. .raw_entry().from_key(...) is also unnecessary, unless I'm missing something it's identical to already stabilized .get_key_value(...).

I also would like to point out that RawVacantEntryMut doesn't really do much other than providing an API that allows insertion which provides a reference to inserted key and value. This structure doesn't store anything other than a mutable reference to a hash map. This particular API can be used to create unrelated keys, like in this example.

#![feature(hash_raw_entry)]

use std::collections::HashMap;

fn main() {
    let mut map = HashMap::new();
    map.raw_entry_mut().from_key(&42).or_insert(1, 2);
    println!("{}", map[&1]);
}

This is a bit like calling insert after determining an entry is vacant. I think raw_entry_mut APIs could return Options just like raw_entry APIs.

#![feature(hash_raw_entry)]

use std::collections::hash_map::{HashMap, RawEntryMut};

fn main() {
    let mut map = HashMap::new();
    if let RawEntryMut::Vacant(_) = map.raw_entry_mut().from_key(&42) {
        map.insert(1, 2);
    }
    println!("{}", map[&1]);
}

I think raw entry API is useful, but I don't think its API should be conflated with entry API.

@tkaitchuck
Copy link
Contributor

tkaitchuck commented Mar 28, 2021

As discussed here: rust-lang/hashbrown#232
Allowing the user to specify the hashed value with the contract that it is generated in the same way that the map computes that hash has two drawbacks:

  1. It locks in the implementation of the Hashmap to never changing how that code is invoked. In particular this prohibits hashmap from ever using specialization. This is leaving significant performance gains on the table for types with fixed lengths and short strings. (This makes raw_entry a non-zero-cost-abstraction because the cost is incurred even if the feature is not used.)
  2. It creates an opportunity for a bugs in applications that accidently do something different. If for example an application takes advantage of this to create a memoized hash or similar, and their calculation is different in some cases the results will be unexpected and lack a clear error message.

If the feature of a user specified hash is needed, it may be useful to instead provide a method on the raw entry to hash a key. That way the hashmap can implement this however it sees fit and the application code is less error prone because there is an unambiguous way to obtain the hash value if it is not known in advance.

@workingjubilee
Copy link
Member

Doesn't this decision mean binary bloat because std::hashmap will exist because some dependency somewhere uses it and a duplicate implementation would be pulled in through hashbrown for those that need HashTable? Is there a reason not to standardize HashTable?

EDIT: To be clear, I'm not against this, but I'd like to understand if there's an interest in standardizing HashTable

The details of HashTable are extremely dependent on its particular lookup algorithm. So it goes with any "raw entry" API. We do not want to specify that sort of thing, because otherwise what happened to std::unordered_map will happen to std::collections::HashMap.

As for binary bloat, the concern you cited is why Rust allows function merging. If the implementations of two codepaths happen to be the same, their functions will simply point to the same code.

@vlovich
Copy link

vlovich commented Feb 27, 2025

Function merging happens across crates with different versions? Is it specific to functions with the same name from the same crate name or is there some kind of search algorithm to find all functions that are the same (eg if I copy paste a function to my crate, does it get deduped as well?).

@cuviper
Copy link
Member

cuviper commented Feb 27, 2025

LLVM's MergeFunctions pass doesn't care about names or crates or versions, just whatever is in the compilation unit -- which can be everything if you use full LTO.

@vlovich
Copy link

vlovich commented Feb 27, 2025

I think that's the part I was trying to highlight - the odds of two implementations of hash table being in the same compilation unit is low and you're only going to see this maybe get resolved if you use full LTO whereas most people at most use thinLTO. I think code bloat is a valid concern and one that's not easily dismissed by "the optimizer will handle it"

@cuviper
Copy link
Member

cuviper commented Feb 27, 2025

Much of it will be duplicated across CUs anyway by monomorphization and/or #[inline], even when it is coming from a single implementation. There's very little compiled code to share from the libhashbrown.rlib in your sysroot.

matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 2, 2025
…k-Simulacrum

Stop using `hash_raw_entry` in `CodegenCx::const_str`

That unstable feature (rust-lang#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
tgross35 added a commit to tgross35/rust that referenced this issue Mar 2, 2025
…k-Simulacrum

Stop using `hash_raw_entry` in `CodegenCx::const_str`

That unstable feature (rust-lang#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 3, 2025
…k-Simulacrum

Stop using `hash_raw_entry` in `CodegenCx::const_str`

That unstable feature (rust-lang#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 3, 2025
…k-Simulacrum

Stop using `hash_raw_entry` in `CodegenCx::const_str`

That unstable feature (rust-lang#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 4, 2025
Rollup merge of rust-lang#137741 - cuviper:const_str-raw_entry, r=Mark-Simulacrum

Stop using `hash_raw_entry` in `CodegenCx::const_str`

That unstable feature (rust-lang#56167) completed fcp-close, so the compiler needs to be
migrated away to allow its removal. In this case, `cg_llvm` and `cg_gcc`
were using raw entries to optimize their `const_str_cache` lookup and
insertion. We can change that to separate `get` and (on miss) `insert`
calls, so we still have the fast path avoiding string allocation when
the cache hits.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 10, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 10, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 11, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 12, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
jieyouxu added a commit to jieyouxu/rust that referenced this issue Mar 12, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 12, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
matthiaskrgr added a commit to matthiaskrgr/rust that referenced this issue Mar 12, 2025
Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 12, 2025
Rollup merge of rust-lang#137701 - cuviper:sharded-hashtable, r=fmease

Convert `ShardedHashMap` to use `hashbrown::HashTable`

The `hash_raw_entry` feature (rust-lang#56167) has finished fcp-close, so the compiler
should stop using it to allow its removal. Several `Sharded` maps were
using raw entries to avoid re-hashing between shard and map lookup, and
we can do that with `hashbrown::HashTable` instead.
@bors bors closed this as completed in 883f00c Mar 13, 2025
rust-timer added a commit to rust-lang-ci/rust that referenced this issue Mar 13, 2025
Rollup merge of rust-lang#138425 - cuviper:remove-hash_raw_entry, r=jhpratt

Remove `feature = "hash_raw_entry"`

The `hash_raw_entry` feature finished [fcp-close](rust-lang#56167 (comment)) back in August, and its remaining uses in the compiler have now been removed, so we should be all clear to remove it from `std`.

Closes rust-lang#56167
github-actions bot pushed a commit to rust-lang/miri that referenced this issue Mar 14, 2025
Remove `feature = "hash_raw_entry"`

The `hash_raw_entry` feature finished [fcp-close](rust-lang/rust#56167 (comment)) back in August, and its remaining uses in the compiler have now been removed, so we should be all clear to remove it from `std`.

Closes #56167
github-actions bot pushed a commit to model-checking/verify-rust-std that referenced this issue Mar 14, 2025
…hpratt

Remove `feature = "hash_raw_entry"`

The `hash_raw_entry` feature finished [fcp-close](rust-lang#56167 (comment)) back in August, and its remaining uses in the compiler have now been removed, so we should be all clear to remove it from `std`.

Closes rust-lang#56167
@fmease fmease closed this as not planned Won't fix, can't repro, duplicate, stale Mar 19, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A-collections Area: `std::collections` B-unstable Blocker: Implemented in the nightly compiler and unstable. C-tracking-issue Category: An issue tracking the progress of sth. like the implementation of an RFC disposition-close This PR / issue is in PFCP or FCP with a disposition to close it. finished-final-comment-period The final comment period is finished for this PR / Issue. Libs-Tracked Libs issues that are tracked on the team's project board. S-tracking-design-concerns Status: There are blocking design concerns. T-libs-api Relevant to the library API team, which will review and decide on the PR/issue.
Projects
None yet
Development

Successfully merging a pull request may close this issue.