Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add string keyed dictionary ReadOnlySpan<char> lookup support #27229

Closed
TylerBrinkley opened this issue Aug 24, 2018 · 145 comments · Fixed by #103191
Closed

Add string keyed dictionary ReadOnlySpan<char> lookup support #27229

TylerBrinkley opened this issue Aug 24, 2018 · 145 comments · Fixed by #103191
Assignees
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections
Milestone

Comments

@TylerBrinkley
Copy link
Contributor

TylerBrinkley commented Aug 24, 2018

UPDATED 1/22/2024 by @stephentoub

Option 1 (recommended)

namespace System.Collections.Generic
{
    public static class CollectionExtensions
    {
+       public static Dictionary<TKey, TValue>.MappedLookup<TMappedKey> GetMappedLookup<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary)
+           where TKey : notnull where TMappedKey : allow ref struct;
+       public static bool TryGetMappedLookup<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary,
+           out Dictionary<TKey, TValue>.MappedLookup<TMappedKey> lookup)
+           where TKey : notnull where TMappedKey : allow ref struct;

+       public static HashSet<T>.MappedLookup<TMapped> GetMappedLookup<TMapped>(
+           this HashSet<T> set)
+           where TMapped : allow ref struct;
+       public static bool TryGetMappedLookup(
+           this HashSet<T> set, out HashSet<T>.MappedLookup<TMapped> lookup)
+           where TMapped : allow ref struct;
    }

    public class Dictionary<TKey, TValue>
    {
+       public readonly struct MappedLookup<TMappedKey> where TMappedKey : allow ref struct
+       {
+           public bool ContainsKey(TMappedKey key);
+           public bool TryAdd(TMappedKey  key, TValue value);
+           public bool TryGetValue(TMappedKey key, [MaybeNullWhen(false)] out TValue value);
+           public bool TryGetValue<TKey, TValue, TMappedKey>(TMappedKey  key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value);
+           public bool Remove(TMappedKey key);
+           public bool Remove(TMappedKey  key, [MaybeNullWhen(false)] out TValue value));
+       }
    }

    public class HashSet<T>
    {
+       public readonly struct MappedLookup<TMapped>
+       {
+           public bool Add(TMapped item);
+           public bool Contains(TMapped item);
+           public bool Remove(TMapped item);
+           public bool TryGetValue(TMapped equalValue, [MaybeNullWhen(false)] out T actualValue);
+       }
    }
}

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
+       public static ref TValue? GetValueRefOrAddDefault<TKey, TValue, TMappedKey>(
+           Dictionary<TKey, TValue>.MappedLookup<TMappedKey> dictionary,
+           TMappedKey key, out bool exists)
+           where TKey : notnull where TMappedKey : allow ref struct;
+       public static ref TValue GetValueRefOrNullRef<TKey, TValue, TMappedKey>(
+           Dictionary<TKey, TValue>.MappedLookup<TMappedKey> dictionary,
+           TMappedKey key)
+           where TKey : notnull where TMappedKey : allow ref struct;
    }
}

namespace System.Collections.Concurrent
{
    public class ConcurrentDictionary<TKey, TValue>
+   {
+       public ConcurrentDictionary<TKey, TValue>.MappedLookup<TMappedKey> GetMappedLookup<TMappedKey>() 
+           where TMappedKey : allow ref struct;
+       public bool TryGetMappedLookup<TMappedKey>(
+           out ConcurrentDictionary<TKey, TValue>.MappedLookup<TMappedKey> lookup)
+           where TMappedKey : allow ref struct;

+       public readonly struct MappedLookup<TMappedKey>
+       {
+           public bool ContainsKey<TKey, TValue, TMappedKey>(TMappedKey key);
+           public bool TryAdd<TKey, TValue, TMappedKey>(TMappedKey key, TValue value);
+           public bool TryGetValue<TKey, TValue, TMappedKey>(TMappedKey key, [MaybeNullWhen(false)] out TValue value);
+           public bool TryGetValue<TKey, TValue, TMappedKey>(TMappedKey key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value);
+           public bool TryRemove<TKey, TValue, TMappedKey>(TMappedKey key, [MaybeNullWhen(false)] out TValue value);
+           public bool TryRemove<TKey, TValue, TMappedKey>(TMappedKey key, [NotNullWhen(true) out TKey? actualKey, [MaybeNullWhen(false)] out TValue value);
+       }
+   }
}

namespace System.Collections.Immutable
{
    public class FrozenDictionary<TKey, TValue>
    {
+       public FrozenDictionary<TKey, TValue>.MappedLookup<TMappedKey> GetMappedLookup<TMappedKey>()
+           where TMappedKey : allow ref struct;
+       public bool TryGetMappedLookup<TMappedKey>(
+           out FrozenDictionary<TKey, TValue>.MappedLookup<TMappedKey> lookup)
+           where TMappedKey : allow ref struct;

+       public readonly struct MappedLookup<TMappedKey>
+       {
+           public bool ContainsKey<TKey, TValue, TMappedKey>(TMappedKey key);
+           public bool TryGetValue<TKey, TValue, TMappedKey>(TMappedKey key, [MaybeNullWhen(false)] out TValue value);
+           public bool TryGetValue<TKey, TValue, TMappedKey>(TMappedKey key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value);
+       }
    }

    public class FrozenSet<T>
    {
+       public FrozenSet<T>.MappedLookup<TMapped> GetMappedLookup<TMapped>()
+           where TMappedKey : allow ref struct;
+       public bool TryGetMappedLookup<TMappedKey>(
+           out FrozenSet<T>.MappedLookup<TMapped> lookup)
+           where TMappedKey : allow ref struct;

+       public readonly struct MappedLookup<TMapped>
+       {
+           public bool Contains(TMapped item);
+           public bool TryGetValue(TMapped equalValue, [MaybeNullWhen(false)] out T actualValue);
+       }
    }
}

Pros of Option 1:

  • Keeps refstruct-based APIs separated out
  • Enables caching any lookup costs
  • Makes a "Try" method more tolerable, which is important in cases where you're handed a dictionary, have a span you want to use for lookup, and want to use the span if possible or else ToString or equivalent and use that

Cons of Option 1:

  • Poor type inference on With
  • Because of the above, makes working with anonymous TValues complicated

Option 2

namespace System.Collections.Generic
{
    public static class CollectionExtensions
    {
+       public static bool ContainsKey<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary, TMappedKey key)
+           where TKey : notnull where TMappedKey : allow ref struct
+       public static bool Remove<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary, TMappedKey key)
+           where TKey : notnull where TMappedKey : allow ref struct
+       public static bool Remove<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary, TMappedKey  key, [MaybeNullWhen(false)] out TValue value))
+           where TKey : notnull where TMappedKey : allow ref struct
+       public static bool TryAdd<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary, TMappedKey  key, TValue value)
+           where TKey : notnull where TMappedKey : allow ref struct
+       public static bool TryGetValue<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary, TMappedKey key, [MaybeNullWhen(false)] out TValue value)
+           where TKey : notnull where TMappedKey : allow ref struct
+       public static bool TryGetValue<TKey, TValue, TMappedKey>(
+           this Dictionary<TKey, TValue> dictionary, TMappedKey  key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value)
+           where TKey : notnull where TMappedKey : allow ref struct

+       public static bool Add<T, TMapped>(this HashSet<T> set, TMapped item) where TMapped : allow ref struct;
+       public static bool Contains<T, TMapped>(this HashSet<T> set, TMapped item) where TMapped : allow ref struct;
+       public static bool Remove<T, TMapped>(this HashSet<T> set, TMapped item) where TMapped : allow ref struct;
+       public static bool TryGetValue<T, TMapped>(this HashSet<T> set, TMapped equalValue, [MaybeNullWhen(false)] out T actualValue) where TMapped : allow ref struct;
    }
}

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
+       public static ref TValue? GetValueRefOrAddDefault<TKey, TValue, TMappedKey>(Dictionary<TKey, TValue> dictionary, TMappedKey key, out bool exists) where TKey : notnull where TMappedKey : allow ref struct;
+       public static ref TValue GetValueRefOrNullRef<TKey, TValue, TMappedKey>(Dictionary<TKey, TValue> dictionary, TMappedKey key) where TKey : notnull where TMappedKey : allow ref struct;
    }
}

namespace System.Collections.Concurrent
{
+   public static class ConcurrentCollectionExtensions
+   {
+       public static bool ContainsKey<TKey, TValue, TMappedKey>(
+           this ConcurrentDictionary<TKey, TValue> dictionary, TMappedKey key)
+           where TKey : notnull;
+       public static bool TryAdd<TKey, TValue, TMappedKey>(
+           this ConcurrentDictionary<TKey, TValue> dictionary, TMappedKey key, TValue value)
+           where TKey : not null;
+       public static bool TryGetValue<TKey, TValue, TMappedKey>(
+           this ConcurrentDictionary<TKey, TValue> dictionary, TMappedKey key, [MaybeNullWhen(false)] out TValue value) 
+           where TKey : not null;
+       public static bool TryGetValue<TKey, TValue, TMappedKey>(
+           this ConcurrentDictionary<TKey, TValue> dictionary, TMappedKey key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value)
+           where TKey : not null where TMappedKey : allow ref struct;
+       public static bool TryRemove<TKey, TValue, TMappedKey>(
+           this ConcurrentDictionary<TKey, TValue> dictionary, TMappedKey key, [MaybeNullWhen(false)] out TValue value) 
+           where TKey : notnull;
+   }
}

namespace System.Collections.Immutable
{
    public class FrozenDictionary<TKey, TValue>
    {
+       public bool ContainsKey<TMappedKey>(TMappedKey key) where TKey : notnull where TMappedKey : allow ref struct;
+       public bool TryGetValue<TMappedKey>(TMappedKey key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value) where TKey : not null where TMappedKey : allow ref struct;
+       public bool TryGetValue<TMappedKey>(TMappedKey key, [MaybeNullWhen(false)] out TValue value) where TKey : not null where TMappedKey : allow ref struct;
    }

    public class FrozenSet<T>
    {
+       public bool Contains<TMapped>(TMapped item);
+       public bool TryGetValue<TMapped>(TMapped equalValue, [MaybeNullWhen(false)] out T actualValue);
    }
}

Pros of Option 2:

  • Fewer types
  • Can index directly into the collection with the key without needing the separate lookup
  • Inference "just works"

Cons of Option 2:

  • More expensive at runtime; every call does a type check
  • Worse usability when you have something that casts to ReadOnlySpan<T>, e.g. a Span<T>; calls bind to the instance members if not exact match to the extensions
  • Would be a lot more surface area (and confusing in some cases) to add Try variants where false means the dictionary doesn't support the mapped lookup type

Regardless of approach:

namespace System.Collections.Generic
{
    // implemented by all string-related comparers, e.g. the comparer instances returned from StringComparer.Ordinal, EqualityComparer<string>.Default, etc.
+   public interface IMappedEqualityComparer<TMapped, T> where TMapped : allow ref struct
+   {
+       bool Equals(TMapped mapped, T other);
+       int GetHashCode(TMapped span);
+       TKey Map(TMapped mapped);
+   }
}

The above is based on the assumption that we will be getting a ref struct anti-constraint in for .NET 9 / C# 13. If that ends up not happening, this will be revamped to look almost the same, except using ReadOnlySpan<TKeyElement> instead of TMappedKey.


UPDATED 1/3/2024 by @stephentoub.

namespace System.Collections.Generic
{
    // implemented by all string-related comparers, e.g. the comparer instances
    // returned from StringComparer.Ordinal, EqualityComparer<string>.Default, etc.
+   public interface IMappedSpanEqualityComparer<TSpanElement, T>
+   {
+       bool Equals(ReadOnlySpan<TSpanElement> span, T other);
+       int GetHashCode(ReadOnlySpan<TSpanElement> span);
+       T Map(ReadOnlySpan<TSpanElement> span);
+   }

    public static class CollectionExtensions
    {
+       public static bool ContainsKey<TKey, TValue, TSpanElement>(this Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key) where TKey : notnull;
+       public static bool Remove<TKey, TValue, TSpanElement>(this Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key) where TKey : notnull;
+       public static bool Remove<TKey, TValue, TSpanElement>(this Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, [MaybeNullWhen(false)] out TValue value) where TKey : notnull;
+       public static bool TryAdd<TKey, TValue, TSpanElement>(this Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, TValue value) where TKey : not null;
+       public static bool TryGetValue<TKey, TValue, TSpanElement>(this Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, [MaybeNullWhen(false)] out TValue value) where TKey : notnull;
+       public static bool TryGetValue<TKey, TValue, TSpanElement>(this Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, [NotNullWhen(true)] out TKey? actualKey, [MaybeNullWhen(false)] out TValue value) where TKey : notnull;

+       public static bool Add<T, TSpanElement>(this HashSet<T> set, ReadOnlySpan<TSpanElement> item);
+       public static bool Contains<T, TSpanElement>(this HashSet<T> set, ReadOnlySpan<TSpanElement> item);
+       public static bool Remove<T, TSpanElement>(this HashSet<T> set, ReadOnlySpan<TSpanElement> item);
+       public static bool TryGetValue<T, TSpanElement>(this HashSet<T> set, ReadOnlySpan<TSpanElement> equalValue, [MaybeNullWhen(false)] out T actualValue);
    }
}

namespace System.Runtime.InteropServices
{
    public static class CollectionsMarshal
    {
+       public static ref TValue? GetValueRefOrAddDefault<TKey, TValue, TSpanElement>(Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, out bool exists) where TKey : notnull;
+       public static ref TValue GetValueRefOrNullRef<TKey, TValue, TSpanElement>(Dictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key) where TKey : notnull;
    }
}

namespace System.Collections.Concurrent
{
+   public static class ConcurrentCollectionExtensions
+   {
+       public static bool ContainsKey<TKey, TValue, TSpanElement>(this ConcurrentDictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key) where TKey : notnull;
+       public static bool TryAdd<TKey, TValue, TSpanElement>(this ConcurrentDictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, TValue value) where TKey : not null;
+       public static bool TryGetValue<TKey, TValue, TSpanElement>(this ConcurrentDictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, [MaybeNullWhen(false)] out TValue value) where TKey : not null;
+       public static bool TryRemove<TKey, TValue, TSpanElement>(this ConcurrentDictionary<TKey, TValue> dictionary, ReadOnlySpan<TSpanElement> key, [MaybeNullWhen(false)] out TValue value) where TKey : notnull;
+   }
}

If .NET 9 and C# 13 and up supporting ref struct constraints, we would tweak these APIs to instead take a TMappingKey : ref struct rather than a ReadOnlySpan<TElement>. See #27229 (comment) for more details.

Open issues:

  • Name of the interface.
  • Name of the element generic method parameter.
  • Extension methods vs instance methods. We could make these methods generic methods on the target type instead. Should we? Are we still concerned about lots of new methods on a type like Dictionary (e.g. asm code bloat)? Any concerns about (non-virtual) generic methods on a generic type?
  • Do we want Span overloads in addition to ReadOnlySpan overloads? If we keep these as extension methods, you have to explicitly cast to ReadOnlySpan if you have a span, or else the compiler will try and fail to use the instance methods.
  • Any other collection types really important to support initially?

Moved from corefxlab#2438 as the proposed changes could not be implemented in a separate library.

Motivation

I would like to perform dictionary lookups using a ReadOnlySpan<char> value on a string keyed dictionary without having to allocate a string.

There are two different design approaches proposed, add extension methods to Dictionary<string, TValue> or add a new Type .

Extension Method Proposed API

 namespace System.Collections.Generic {
+    public static class DictionaryExtensions {
+        public static bool ContainsKey<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
+        public static bool Remove<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
+        public static bool TryGetValue<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key, out TValue value);
+        public static T GetValue<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
+    }
 }
 namespace System {
      public abstract class StringComparer : IComparer<string>, IEqualityComparer<string>, IComparer, IEqualityComparer {
+        public virtual bool Equals(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
+        public virtual int GetHashCode(ReadOnlySpan<char> obj);
      }
     public sealed class CultureAwareComparer : StringComparer {
+        public override bool Equals(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
+        public override int GetHashCode(ReadOnlySpan<char> obj);
     }
     public class OrdinalComparer : StringComparer {
+        public override bool Equals(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
+        public override int GetHashCode(ReadOnlySpan<char> obj);
     }
 }

Implementation Details

  • Internal details of Dictionary<TKey, TValue> would need to be exposed with the internal modifier to implement the ReadOnlySpan<char> overloads.
  • Only StringComparers that are passed into the constructor would be able to be optimized to not allocate a string.
  • The default implementation of the new virtual methods on StringComparer would convert the ReadOnlySpan<char> to a string and use the string methods instead.

New Type Proposed API

 namespace System.Collections.Specialized {
+    public class StringKeyedDictionary<TValue> : Dictionary<string, TValue> {
+        public StringKeyedDictionary();
+        public StringKeyedDictionary(StringComparer comparer);
+        public StringKeyedDictionary(int capacity);
+        public StringKeyedDictionary(int capacity, StringComparer comparer);
+        public StringKeyedDictionary(IDictionary<string, TValue> dictionary);
+        public StringKeyedDictionary(IDictionary<string, TValue> dictionary, StringComparer comparer);
+        public StringKeyedDictionary(IEnumerable<KeyValuePair<string, TValue>> collection);
+        public StringKeyedDictionary(IEnumerable<KeyValuePair<string, TValue>> collection, StringComparer comparer);
+        public new StringComparer Comparer { get; }
+        public T this[ReadOnlySpan<char> key] { get; }
+        public bool ContainsKey(ReadOnlySpan<char> key);
+        public bool Remove(ReadOnlySpan<char> key);
+        public bool TryGetValue(ReadOnlySpan<char> key, out TValue value);
+    }
 }
 namespace System {
      public abstract class StringComparer : IComparer<string>, IEqualityComparer<string>, IComparer, IEqualityComparer {
+        public virtual bool Equals(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
+        public virtual int GetHashCode(ReadOnlySpan<char> obj);
      }
     public sealed class CultureAwareComparer : StringComparer {
+        public override bool Equals(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
+        public override int GetHashCode(ReadOnlySpan<char> obj);
     }
     public class OrdinalComparer : StringComparer {
+        public override bool Equals(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
+        public override int GetHashCode(ReadOnlySpan<char> obj);
     }
 }

Implementation Details

  • Internal details of Dictionary<TKey, TValue> would need to be exposed with the private protected modifier to implement the ReadOnlySpan<char> overloads.
  • The default implementation of the new virtual methods on StringComparer would convert the span to a string and use the string methods instead.
  • A null comparer passed to the constructor would default to StringComparer.Ordinal.

Possible Addition

While not specifically needed for this request a ReadOnlySpan<char> Compare overload should probably be added as well while we're updating StringComparer.

 namespace System {
     public abstract class StringComparer : IComparer, IEqualityComparer, IComparer<string>, IEqualityComparer<string> {
+        public virtual int Compare(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
     }
     public sealed class CultureAwareComparer : StringComparer {
+        public override int Compare(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
     }
     public class OrdinalComparer : StringComparer {
+        public override int Compare(ReadOnlySpan<char> x, ReadOnlySpan<char> y);
     }
 }

Open Questions

  • Should the additional ReadOnlySpan<char> Compare overload be included?
  • Should NonRandomizedStringEqualityComparer now derive from StringComparer in order to be supported?
  • Should a TryAdd overload be added as it has the possibility of not requiring to be converted to a string if an equal key is found?

Discussion Summary.

We have two different design approaches, add a new Type or add extension methods to Dictionary<string, TValue>.

Here's a comparison of the pro's and con's of the different design approaches.

Extension Methods StringKeyedDictionary
Pros No new type introduced. No performance penalty due to casting the comparer.
Existing code exposing a string keyed Dictionary needs no changes to use which is helpful when they can't easily be changed. Indexer support.
More scalable
Cons Performance penalty of casting the comparer. New type introduced meaning an additional staple of the .NET ecosystem to know.
No indexer support, must use a GetValue method instead. Existing code that exposes a string keyed Dictionary may not easily be changed to using the new type.
Dictionary internals exposed with the internal modifier. Dictionary internals exposed with the private protected modifier.
Somewhat opaque to user as to knowing the comparer must derive from StringComparer.

Updates

  • Updated namespaces back to System.Collections.Specialized.
  • Changed extension class name to DictionaryExtensions from StringKeyedDictionaryExtensions.
  • Removed proposed IStringEqualityComparer interface and instead relies on comparer deriving from StringComparer.
  • Reordered design approaches to better promote my preference. 😄
@svick
Copy link
Contributor

svick commented Aug 24, 2018

Would it be better to instead have a set of extension methods on Dictionary<string, T>? That way, the new methods could be used on any string-keyed Dictionary, not just on some special type.

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 24, 2018

Dictionary<TKey,TValue> is implemented in Coreclr and the is a high bar for new types because of library size issues. Do you intend or this implementation not to be collision resistant or do you have a method to store the key without the span somehow? convert it into a ReadOnlyMemory?

@benaadams
Copy link
Member

Would it be better to instead have a set of extension methods on Dictionary<string, T>?

How would it perform the search (using hashcode + equals) against the current Dictionary api surface?

do you have a method to store the key without the span somehow? convert it into a ReadOnlyMemory?

ROS<char>.ToString()

@svick
Copy link
Contributor

svick commented Aug 24, 2018

@benaadams

Would it be better to instead have a set of extension methods on Dictionary<string, T>?

How would it perform the search (using hashcode + equals) against the current Dictionary api surface?

It wouldn't use the public API surface, it would use some new internal members on Dictionary. That's worse than StringKeyedDictionary, which would probably use new private protected members, but I think not by much.

@benaadams
Copy link
Member

benaadams commented Aug 24, 2018

Ah.. so something like?

static class StringKeyedDictionaryExtensions 
{
    static bool ContainsKey<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
    static bool Remove<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);
    static bool TryGetValue<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key, out TValue value);
}

@Wraith2
Copy link
Contributor

Wraith2 commented Aug 24, 2018

Right, I think I missed the point then. The idea is to have the standard dictionary at the back end but methods which allow you to use ROS as the key to search for avoiding the allocation to check if something exists. So it needs access to the internals because the comparer needs to understand that it can do ROS-string comparisons which aren't currently possible.

@svick
Copy link
Contributor

svick commented Aug 24, 2018

@benaadams Yup. Plus probably an extension method version of the indexer:

public static T GetValue<TValue>(this Dictionary<string, TValue> dictionary, ReadOnlySpan<char> key);

Though that's an advantage of the StringKeyedDictionary design: it can have a real indexer.

@safern
Copy link
Member

safern commented Aug 24, 2018

Also if they're extension methods we would need to provide overrides to specify a custom Comparer that can compare ROS right? Or that would be achieved by adding the StringEqualityComparer type and interface and in the extension methods check if Comparer is IStringEqualityComparer then use the ROS overrides if not the string ones?

Also with override methods Adding values through the indexer or TryAdd/Add apis, we wouldn't be able to take advantage of the ROS comparisons right?

@benaadams
Copy link
Member

Also if they're extension methods we would need to provide overrides to specify a custom Comparer that can compare ROS right?

Dictionary is based on IEqualityComparer<T> so not sure if it could work... (even if default interfaces implementations pulling up the slack, not sure how it could work for other types as a ROS comparison would be exposed on the interface, which probably wouldn't make sense?)

Also with override methods Adding values through the indexer or TryAdd/Add apis, we wouldn't be able to take advantage of the ROS comparisons right?

Can covert the string to a ROS at nominal cost for the comparison; if it didn't exist the ROS<char> could be turned into a string for persistent key use; so that wouldn't necessarily be a problem.

@safern
Copy link
Member

safern commented Aug 25, 2018

Dictionary is based on IEqualityComparer so not sure if it could work... (even if default interfaces implementations pulling up the slack, not sure how it could work for other types as a ROS comparison would be exposed on the interface, which probably wouldn't make sense?)

Yes, but people could just use StringEqualityComparer and since it would implement IEqualityComparer it can be passed through the constructor. However in order to use the ROS overloads if we go through the Extension methods route we would need to cast the Comparer to IStringEqualityComparer

I think if we want to use the ROS comparison features and other of its features through a custom Comparer we should not use Extension methods, unless I’m missing something we could do in a different way without overriding implementation.

@TylerBrinkley
Copy link
Contributor Author

TylerBrinkley commented Aug 25, 2018

Here's a comparison of the pro's and con's of the different design approaches.

Extension Methods StringKeyedDictionary
Pros No new type introduced. No performance penalty due to casting the comparer.
Existing code exposing a string keyed Dictionary needs no changes to use which is helpful when they can't easily be changed. Indexer support.
Cons Performance penalty of casting the comparer. New type introduced meaning an additional staple of the .NET ecosystem to know.
No indexer support, must use a GetValue method instead. Existing code that exposes a string keyed Dictionary may not easily be changed to using the new type.
Dictionary internals exposed with the internal modifier. Dictionary internals exposed with the private protected modifier.
Somewhat opaque to user as to knowing it uses IStringEqualityComparer.

Honestly, I'm kind of torn on this. Adding a new foundational Type to the ecosystem is costly in terms of what developers need to know. I'm feeling this even more now with the addition of all the types introduced with Spans and Pipelines. Additionally, I could easily see this expanding to others such as HashSet, SortedDictionary, SortedSet, OrderedDictionary & OrderedSet if they ever get implemented, and ConcurrentDictionary.

If the performance isn't an issue I now think I'd prefer the extension method route.

@safern
Copy link
Member

safern commented Aug 28, 2018

@TylerBrinkley thanks for the great summary, I just updated the issue description and included this comparison in there. Will mark as ready for review now.

@TylerBrinkley
Copy link
Contributor Author

I've updated the first post to correct formatting and make it a little easier to understand.

@cdmihai
Copy link
Contributor

cdmihai commented Aug 28, 2018

Overloads for ConcurrentDictionary would also be nice :)

@terrajobst
Copy link
Member

terrajobst commented Aug 28, 2018

Video

On the surface, the extension approach looks more appealing because it would allow the lookups to work against any instance of a dictionary (at least in principle, the current design only works if the dictionary was instantiated with a comparer that implements IStringEqualityComparer).

However, this also complicates the implementation and might have performance implications if we have to expose more methods that only make sense when TKey happens to be string. Also, one could argue that in the hot paths where people would be inclined to lookup with spans that they do in fact control their dictionary and whether they need to pass in the correct comparer or use a derived type of Dictionary<,> doesn't matter.

I totally buy the scenario, but adding a new collection type (even if derived from Dictionary<,>) feels heavy handed, but maybe I'm overthinking this. Maybe just putting it in S.C.Specialized is good enough.

What do others think?

@jkotas
Copy link
Member

jkotas commented Aug 28, 2018

This is trying to workaround the fundamental problem that ref-like types cannot be used as keys in collections today. We will hit the same problem for all keyed collections and many ref-like types. @cdmihai mentioned that ConcurrentDictionary would be nice too. The same can be said for HashMap and other collections. And the same can be said for Utf8 strings, Spans and other ref-like types. If we take this proposal, where are we going to stop – are we going to add all combinations like StringKeyedHashMap, SpanKeyedConcurrentDictionary?

I think we need to have ref struct constraint first to make this scalable.

@TylerBrinkley
Copy link
Contributor Author

TylerBrinkley commented Aug 28, 2018

I agree scalability is an issue if we add new types which is why I'm now in favor of the extension methods route which should scale more favorably.

Even if the ref struct constraint is added I don't see how we can use that for this use case in a non-stack context such as non-ref-like class fields or static fields.

@Joe4evr
Copy link
Contributor

Joe4evr commented Aug 30, 2018

This is trying to workaround the fundamental problem that ref-like types cannot be used as keys in collections today.

It's not so much using ReadOnlySpan<char> as the dictionary key, but doing a lookup with it without copy-allocating the slice it is wrapping.

@benaadams
Copy link
Member

benaadams commented Aug 30, 2018

Is what's equatable/alternative forms; so string <=> ReadOnlySpan<char>; however what about string <=> ReadOnlySpan<byte>, Encoding.UTF8 etc

@d3v3us
Copy link

d3v3us commented Aug 30, 2018

I would change NameValueCollection, making more effective version, passing allowDublicates parameter in constructor.
For dictionary i think extension method will fit completely.

@TylerBrinkley
Copy link
Contributor Author

what about string <=> ReadOnlySpan<byte>, Encoding.UTF8 etc

Another reason the extension methods route makes better sense.

What if we instead relied on the comparer to derive from StringComparer and not define an interface? That way we can easily expand support through adding virtual methods without having to define new interfaces or else wait for default interface implementation support.

@Joe4evr
Copy link
Contributor

Joe4evr commented Aug 30, 2018

what about string <=> ReadOnlySpan<byte>, Encoding.UTF8 etc

What if we instead relied on the comparer to derive from StringComparer and not define an interface?

Sounds reasonable. I can already see use-cases for wanting to determine the value equality between a string and utf8string.

//using regular old UTF-16 strings as a key due to framework reasons,
//even though 99.9% of the time, it doesn't go outside UTF-8 range
private readonly Dictionary<string, Foo> _fooMap = new Dictionary<string, Foo>(StringComparer.OrdinalIgnoreCase);

//what this proposal already boils down to:
void M(ReadOnlySpan<char> sliceOfString)
{
    if (_fooMap.TryGetValue(sliceOfString, out var foo)
    {
        //....
    }
}

//but when incoming data could also be UTF-8:
void M(ReadOnlySpan<byte> sliceOfUtf8String) //or 'ROS<Char8>' if that becomes the more appropriate thing
{
    if (_fooMap.TryGetValue(sliceOfUtf8String, out var foo)
    {
        //....
    }
}

@TylerBrinkley
Copy link
Contributor Author

TylerBrinkley commented Sep 26, 2018

Really appreciated hearing the discussion of this in the .NET Design Reviews GitHub Triage.

I think this should go back to triage to decide the best design approach. Again, personally I'd much prefer the extension method route.

@TylerBrinkley TylerBrinkley changed the title Add a StringKeyedDictionary<T> type Add ReadOnlySpan<char> lookup support for string keyed dictionary Sep 26, 2018
@TylerBrinkley TylerBrinkley changed the title Add ReadOnlySpan<char> lookup support for string keyed dictionary Add string keyed dictionary ReadOnlySpan<char> lookup support Sep 26, 2018
@DaZombieKiller
Copy link
Contributor

What if an API was added to Dictionary<TKey, TValue> that allows you to retrieve a bucket?
For example (where DictionaryBucket is a new type, mainly to ensure that it's read-only):

DictionaryBucket<TValue> Dictionary<TKey, TValue>.GetBucket(int hashCode) or
IReadOnlyList<TValue> Dictionary<TKey, TValue>.GetBucket(int hashCode)

This way, the extension methods could operate on public APIs, and would just need to iterate over the bucket and compare values.

I'm not sure how realistic this would be, and I'm absolutely certain that there are significant problems with this that I haven't considered, but I figured I'd throw the idea out there.

@Wraith2
Copy link
Contributor

Wraith2 commented Dec 12, 2018

It would be exposing and locking implementation details of the dictionary, it would prevent reimplementing using another method if a better one were found in future.

@TylerBrinkley
Copy link
Contributor Author

As Dictionary doesn't currently support this scenario I was looking at implementing this in a custom dictionary and am having trouble getting the hashcode of the span. Are there any API's currently exposed which allows the retrieval of the hashcode of a ReadOnlySpan<char> in the Ordinal or OrdinalIgnoreCase sense without converting to a string?

@jkotas
Copy link
Member

jkotas commented Apr 22, 2019

Look at the APIs introduced in https://github.com/dotnet/corefx/issues/31302

@TylerBrinkley
Copy link
Contributor Author

@jkotas Thanks. Since .NET Standard 2.1 is including some API's introduced for .NET Core 3.0 would including this API be considered as well?

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 28, 2024

  • We discussed making IAlternateEqualityComparer : IEqualityComparer, but if TAlternate == T then Equals(TAlternate, T) conflicts with Equals(T, T) and makes the universe implode.

I quickly tried this out and could not immediately find a situation where it blows up when TAlternate == T.

Maybe it would still blow up in a situation I haven't covered, but so far it looks like inheriting IEqualityComparer<T> could just work.

Also, in int GetHashCode(TAlternate span); the parameter name should probably be changed.

@stephentoub
Copy link
Member

but so far it looks like inheriting IEqualityComparer<T> could just work.

Yeah, we investigated more after the recording and came to the same conclusion, but didn't end up adding back the inheritance. We still could, though.

@halter73
Copy link
Member

Though Wiktionary says that alternate (adj.) only means other/alternative in US English; so maybe we should s/Alternate/Alternative/ avoid a US-only definition and an ESL confusion.

I missed this API review live, but I support updating the API with s/Alternate/Alternative/. As a native English speaker from the US, I understand the alternative meaning of alternate, but even I had to do a double take to make sure it had nothing to do with alternating at first.

@PaulusParssinen
Copy link
Contributor

Was there rationale for why SortedSet<T> and SortedDictionary<TKey, TValue> were left out?

@eiriktsarpalis
Copy link
Member

@PaulusParssinen these APIs are specifically designed for hash tables, SortedSet and SortedDictionary are implemented using search trees.

@PaulusParssinen
Copy link
Contributor

@PaulusParssinen these APIs are specifically designed for hash tables, SortedSet and SortedDictionary are implemented using search trees.

Right..🤦‍♂️Thanks for quick reply!

@mikernet
Copy link
Contributor

It's out of scope for this particular feature, but it is also related - does it make sense to eventually add IMappedSpanComparer as well to support sorted collection scenarios?

@PaulusParssinen
Copy link
Contributor

PaulusParssinen commented May 27, 2024

I have been prototyping a large perf oriented PR for ILCompiler name-mangling logic for a couple weeks now and I keep hitting the need for this.

Given the API is approved, is it up-for-grabs? Or does someone on the team have this prototyped already while designing the API shape (IIRC during API reviews someone said they did?)

edit: Oh wait, we need : allows ref struct support to flow to repo first.. 🤦‍♂️ We don't have it yet, right?

@stephentoub
Copy link
Member

I have it implemented, and it needs allows ref struct.

@stephentoub
Copy link
Member

#102907 has been merged, adding the core features and implementations on Dictionary<>, HashSet<>, and ConcurrentDictionary<>.

I initially held off on FrozenDictionary<>/FrozenSet<> because of GVM concerns. I've gotten past those, but there's a bigger issue. Some of the implementations (e.g. LengthBucketFrozenDictionary) rely on knowing the length of the input in order to find the key in the collection. The TAlternateKey design and IAlternateEqualityComparer don't make this possible, as there's no way to ask the interface "how many characters long would the TKey be". The choice of which implementation we use is intended to be an implementation detail, but we can't have some of the implementations support the alternate comparer and others not, because then the implementation choice actually impacts semantics. And we can't outlaw using these implementations when the comparer implements the alternate comparer, because a) we don't know at that time what the TAlternateKey might be (there could be many), and even if we did, b) StringComparer.Ordinal{IgnoreCase} are the most popular comparers and do implement that interface. Given all of this, as long as we have such implementations, FrozenDictionary<>/FrozenSet<> aren't compatible with this feature. I'm going to close this issue then, and if we find a better alternative in the future (no pun intended), we can add the additional APIs then.

@MihaMarkic
Copy link

@stephentoub Oh, not supporting Frozen ones - that's a pity. What about ImmutableDictionary?

@stephentoub
Copy link
Member

ImmutableDictionary is rarely the right type to use. Lookups are O(log N) and in genai much more exoensive than the other dictionaries. This feature is about optimizing lookups. If you want to optimize lookups, the first step is not using ImmutableDictionary. 😉

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jun 6, 2024

It seems that it would be possible to support this scenario assuming alternative keys were constrained to ReadOnlySpan<T>.

@MihaMarkic
Copy link

@stephentoub Yeah, well, since Frozen is out of question, there is really not much of an option if one is after immutability.

@stephentoub
Copy link
Member

It seems that it would be possible to support this scenario assuming alternative keys were constrained to ReadOnlySpan<T>.

I believe so (though it's possible there's something else lurking). It would mean trying to get a lookup would fail for a string key with an alternate key other than ReadOnlySpan<char>. It'd be a restriction we document. Is that your preference, @eiriktsarpalis ?

@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Jun 6, 2024

I was asking if the originally proposed IMappedSpanEqualityComparer design might be a better fit for this reason, but you're suggesting if we add a runtime check or perhaps statically constrain TAlternate to always be a ROS?

It would seem like all these GetAlternateLookup methods already have the capacity to throw if passed an incorrect type argument, right? It doesn't seem like a huge deal if we did the same thing here with a more restrictive subset.

@stephentoub
Copy link
Member

I was asking if the originally proposed IMappedSpanEqualityComparer design might be a better fit for this reason

If we wanted to constrain it, we wouldn't need a new interface design; we could just change the entrypoint {Try}GetAlternateLookups for these types to themselves be constrained, such that they'd only ever work with specific keys/alternate keys. My concern there is it's elevating a current implementation-detail-based limitation to public surface area.

but you're suggesting if we add a runtime check

Yeah.

It would seem like all these GetAlternateLookup methods already have the capacity to throw if passed an incorrect type argument, right?

Yes, based on whether the specified TAlternateKey works with the collection's comparer. And the developer can fully control that: if they construct the collection with a comparer that supports TAlternateKey, then TAlternateKey will work. With this, we're adding an extra restriction which would be that even if that comparer/alternate key work together, we'd explicit reject it in situations implementation details don't currently support.

I'll code it up and we can decide.

@dotnet-policy-service dotnet-policy-service bot added the in-pr There is an active PR which will close this issue when it is merged label Jun 8, 2024
@teo-tsirpanis teo-tsirpanis removed the in-pr There is an active PR which will close this issue when it is merged label Jun 21, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Jul 23, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-approved API was approved in API review, it can be implemented area-System.Collections
Projects
None yet
Development

Successfully merging a pull request may close this issue.