-
Notifications
You must be signed in to change notification settings - Fork 4.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add System.Collections.Concurrent.ConcurrentHashSet<T> #39919
Comments
By the way, this issue is not duplicated of #15326 , though we have almost the same requirement. |
Tagging subscribers to this area: @eiriktsarpalis |
Thanks for doing this. Could you please update the proposal to be complete? No implementation is needed, but everything that would be public, eg.,
|
@danmosemsft Updated, thanks!
Just take reference of
Sorry, I'm confused whether the whole members of interface contract should be listed above? I deleted overloads of
Agreed. I moved the related content to 'Alternative Designs'.
Agreed & done.
I think NO, because it's a simpler collection than ConcurrentDictionary. For ConcurrentDictionary, we usually use AddOrUpdate/TryUpdate or factory delegate overloads to update
I think we can add it. The following is the reasons from the perspective of my view,
|
Thanks
We're looking for the same format as in this file |
@danmosemsft Done. |
Ok thanks. The owners of this area will take a look. It might be a little while since we are focused on shipping 5.0 right now. |
This is a compromise that has more or less worked fine, at least for my own requirements in the past. It would be interesting if we could quantify any performance improvements that come out of a dedicated ConcurrentHashSet implementation. |
Even if you just used a |
Adding new classes always comes at a cost, even if they're just facades. |
Well, so the situation becomes that we're all waiting for some kind of implementation by someone now, in order to compare it with the current solution... that doesn't sound good =.= |
It will be more space efficient, even if it's no faster, which seems potentially worthwhile to me. I noticed this (didn't look at it): https://github.com/i3arnon/ConcurrentHashSet by @i3arnon |
In practice though we'd be looking at maybe shaving one byte per |
That is true. |
On 64-bit it generally won't even save that. |
It depends on the type being used, doesn't it? |
Yes, but for almost anything other than key==int, it won't increase the size. The node stores the key, the value, a next node reference, and an int hash code, so whether the value increases the size of the object depends on whether the value can fit in the padding for the hashcode or whether it's consumed by the key. If the value is byte, then if the key is any reference type, for example, the byte value will just be in what is otherwise padding space from the hash code. The original example motivating this issue was for Guid as the T. In that case, the byte value would similarly just fit into the padding for the int, which is what I was referring to (but wasn't clear). |
Closing since the benefits over using the |
Background and Motivation
We already have ConcurrentDictionary which takes great advantages at concurrent environment comparing to Dictionary. However, HashSet doesn't have a Concurrent version.
Sometimes(maybe very often in fact), we hope to use ConcurrentHashSet easily which its elements are never duplicated, with an O(1) operation for contains or remove methods at concurrent environment. It's quite different from ConcurrentBag and provides another choice for suitable usage.
You can also see details at issue #16443 . Almost 5 years passes, nobody gives an API proposal, so I'm here to finish it.
Proposed API
Usage Examples
The main goal of ConcurrentHashSet is to improve performance and reduce cost, taking replace of HashSet-With-Lock pattern for concurrent environment. If we have it, the code can be written like below,
The difference could be very similiar to the one between Dictionary and ConcurrentDictionary.
Alternative Designs
Many developers tell that they use
ConcurrentDictionary<T, byte>
as a replace when meeting the requirement. They don't want theValue
actually which is meaningless and totally a waste of memory. However, someones suggest that the first version of ConcurrentHashSet can be implemented by underlyingConcurrentDictionary<T, byte>
, in order to add the APIs as soon as possible. Individually, I prefer to an independent implementation which is necessary to be discussed between excellent engineers of Microsoft & our community.Do we need it to inherit
ISet<T>
/IReadOnlySet<T>
? Though it's the concurrent version of HashSet, I'm not sure whether each API of HashSet needs a copy. In my opinion, not all APIs make sense while running in a concurrent environment. Of course, the discussion is required.Do we need this API
public HashSet<T> ToHashSet()
to get a synchronous instance for easy usage?Risks
It might not own concurrent version of the whole APIs of HashSet. Because it's a new collection, no break changes will occur. It just needs a pretty design :).
The text was updated successfully, but these errors were encountered: