Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extend Dictionary<K, V> with a GetOrAdd method #32679

Closed
CharlesPublic opened this issue Feb 21, 2020 · 12 comments
Closed

Extend Dictionary<K, V> with a GetOrAdd method #32679

CharlesPublic opened this issue Feb 21, 2020 · 12 comments

Comments

@CharlesPublic
Copy link

CharlesPublic commented Feb 21, 2020

One thing that always annoyed me about the System.Collections.Generic.Dictionary<TKey, TValue> class
is that we often use code like this:

public static TValue GetOrAddNew<TKey, TValue>(
    this IDictionary<TKey, TValue> dict, TKey key)
    where TValue : class, new()
{
    // 1-2 calls for reference types to
    // FindValue() / GetHashCode() / Equals()
    if (!dict.TryGetValue(key, out TValue val))
        val = dict[key] = new TValue();

    return val;
}

// for value types this is even worse:
var dict = new Dictionary<int, int>();
var rolledNumber = rnd.Next(0, 500);

// Duplicate calls to FindValue()
if (dict.TryGetValue(rolledNumber, out int valBefore))
    dict[rolledNumber] = ++valBefore;
else
    dict[rolledNumber] = 1; // some init value

Especially when implementing algorithms the double performance cost of FindValue is not great.
Often times you would want to add an Entry if it does not exist or increment / edit it if it does.

  • one option would be to add InsertionBehavior.ReturnOnExisting for TryInsert
  • another option would be to add another parameter to TryGetValue to set the value when it does not exist
  • setting the value could happen via a Func<>, new() or default
  • something with ref entry

There are probably other good options.

The ConcurrentDictionary class already has a GetOrAdd() method, but it also uses a double lookup.
The DictionarySlim.GetOrAddValueRef(TKey) method looks great.

What are the thoughts on this?

@Dotnet-GitSync-Bot Dotnet-GitSync-Bot added area-System.Collections untriaged New issue has not been triaged by the area owner labels Feb 21, 2020
@DevDrake
Copy link

I wouldn't expect that kind of method to be in public API personally. It seems like you will update this value in the end (otherwise I don't see use case for that), so you will end up with dict[key] = modified_value, which will insert keyValue into the Dictionary.
In your proposed solution you are returning val, not reference to dict[key] therefore I see no difference to dict.TryGetValue(key, out TValue val)

ConcurrentDictionary seems to be different animal in that case. As also you have AddOrUpdate which is equivalent to dict[key] = val in case of Dictionary.

@svick
Copy link
Contributor

svick commented Feb 23, 2020

There is already a proposal for adding GetOrAdd to Dictionary: #15059.

@danmoseley
Copy link
Member

Or consider Dictionaryslim. The comment at the top of it explains the tradeoffs it offers.
https://github.com/dotnet/corefxlab/blob/master/src/Microsoft.Experimental.Collections/Microsoft/Collections/Extensions/DictionarySlim.cs#L217

@CharlesPublic
Copy link
Author

@DevDrake
Maybe the example is bad because it uses default.
But even with a reference type and new() instead of default i would still have
best case 1x FindValue, worst case 2x FindValue.

DictionarySlim.GetOrAddValueRef seems to do what im looking for, just the DictionarySlim is not comparable to the normal Dictionary.

In c++ the following is possible:

std::unordered_map<char,int> freq;
std::string s = "ABCBADCA";

for (const char &c: s) {
	freq[c]++;	
}

It would be great to have something similar in c# performance wise.

@danmoseley
Copy link
Member

just the DictionarySlim is not comparable to the normal Dictionary.

It's true it has semantic differences eg it requires equation to be cheap. Which difference is blocking in your scenario?

@CharlesPublic
Copy link
Author

CharlesPublic commented Feb 26, 2020

@danmosemsft

As i understand the normal Dictionary<K,V> does store HashCodes for every entry. This makes comparisons inside a bucket a little bit faster (depending on the type). Compared to DictionarySlim<K,V> which skips GetHashCode() call for entries inside a bucket and instead calls .Equals() for every key. Correct me if I'm wrong.

What speaks against adding a GetOrAdd() method to the normal Dictionary?

I updated my examples in the initial post, i hope they are a little bit clearer now.

It looks like DictionarySlim<TKey, TValue> is also missing some very useful interfaces like ICollection and IDictionary

@danmoseley
Copy link
Member

GetOrAdd() assumes the key is not a value type. Yes you're correct that the other requirement is that equating keys is not too expensive. That's a soft requirement though as in most cases there will be no collisions and both dictionaries will equate keys exactly once per lookup

@danmoseley
Copy link
Member

If you have feedback on it I'd be interested. It might make it into the platform proper if it's clearly useful enough.

@CharlesPublic
Copy link
Author

CharlesPublic commented Feb 26, 2020

Here are some examples of why such a function would be useful.
The redundant pattern is used excessively in algorithms, in popular nuget packages like Newtonsoft.Json and all over the .NET Framework.

// Newtonsoft.Json, XmlNodeConverter
else if (!dictionary.TryGetValue(propertyName2, out value)) // FindValue
    dictionary.Add(propertyName2, xmlNode); // FindValue again

// Newtonsoft.Json, JsonSchemaModelBinder
target.TryGetValue(propertyName, out JsonSchemaNode value);  // FindValue
target[propertyName] = AddSchema(value, schema);   // FindValue again

// System.Linq.Expressions.Compiler.KeyedQueue<K,V>
internal void Enqueue(K key, V value)
{
    if (!_data.TryGetValue(key, out Queue<V> value2))  // FindValue
        _data.Add(key, value2 = new Queue<V>());  // FindValue again
    
    value2.Enqueue(value);
}

// System.Linq.Expressions.Compiler.LambdaCompiler
private LabelInfo EnsureLabel(LabelTarget node)
{
    if (!_labelInfo.TryGetValue(node, out LabelInfo value))   // FindValue
        _labelInfo.Add(node, value = new LabelInfo(_ilg, node, canReturn: false));  // FindValue again
    
    return value;
}

// System.Linq.Parallel.DistinctQueryOperator
if (!m_hashLookup.TryGetValue(key, out TKey value)...)  // FindValue
    m_hashLookup[key] = currentKey2;  // FindValue again


// System.Web.Configuration.FolderLevelBuildProviderCollection.AddMapping

if (!_buildProviderMappings.TryGetValue(appliesTo, out value)) // FindValue
{
    value = new List<Type>();
    _buildProviderMappings.Add(appliesTo, value);  // FindValue again
}

@danmoseley
Copy link
Member

@CharlesPublic does this add anything to #15059 ? If not I suggest we close this and continue there.

@danmoseley
Copy link
Member

I think this is duplicated -- let's continue there unless I'm wrong.

@eiriktsarpalis eiriktsarpalis removed the untriaged New issue has not been triaged by the area owner label Jul 7, 2020
@ghost ghost locked as resolved and limited conversation to collaborators Dec 10, 2020
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

No branches or pull requests

6 participants