Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Provide IEnumerable<T> support for Memory<T> #23950

Open
ahsonkhan opened this issue Oct 25, 2017 · 62 comments · Fixed by dotnet/corefx#26467
Open

Provide IEnumerable<T> support for Memory<T> #23950

ahsonkhan opened this issue Oct 25, 2017 · 62 comments · Fixed by dotnet/corefx#26467
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Memory
Milestone

Comments

@ahsonkhan
Copy link
Member

Rationale

Users should be able to pass Memory<T> to methods that take in IEnumerable<T> similar to array. Currently, this requires calling ToArray() to copy the contents of the Memory<T> into an array for it to work, and is unnecessary.

For example (from @eerhardt):
https://github.com/dotnet/corefxlab/blob/master/src/System.Numerics.Tensors/System/Numerics/DenseTensor.cs#L10

public void SomeMethod(DenseTensor<float> tensor)
{
   Memory<float> memory = tensor.Buffer;
   var value = CreateBatch<T>(..., memory.ToArray(), ...);
}

https://github.com/Microsoft/CNTK/blob/master/bindings/csharp/CNTKLibraryManagedDll/ShimApiClasses/ValueShim.cs#L104

public static Value CreateBatch<T>(NDShape sampleShape, IEnumerable<T> batch, DeviceDescriptor device, bool readOnly = false)
{
    T[] batchAsArray = batch.ToArray();
    return CreateBatch<T>(sampleShape, batchAsArray, 0, batchAsArray.Count(), device, readOnly);
}

If Memory<T> implemented IEnumerable<T>, then the ToArray() call would not be required. However, this would cause applications that reference System.Memory and Memory<T> for any scenario to be larger (when compiled AOT). Furthermore, if Memory<T> was an IEnumerable<T>, it might result in a pit of failure as it could lead to users unintentionally using the enumerator to iterate over the data rather than the more performant indexer on Memory.Span (especially for primitive types like Memory<byte>). To discourage unnecessary use of the enumerator but still provide support for the scenarios where you need an IEnumerable, the proposed solution is to add an adapter class and a ToEnumerable extension method on Memory<T>.

As an FYI, Span<T> cannot implement IEnumerable<T> since it is a stack-only, byref type, and casting it to an interface would cause boxing. The compiler will throw and error: 'Span<T>': ref structs cannot implement interfaces

Proposed API

namespace System
{
   public static class MemoryExtensions {
     public static IEnumerable<T> ToEnumerable(this Memory<T> memory);
   }
}

Usage

Taking the above example, it would look as follows:

public void SomeMethod(DenseTensor<float> tensor)
{
   Memory<float> memory = tensor.Buffer;
   var value = CreateBatch<T>(..., memory.ToEnumerable(), ...);
}

Partial Implementation

public static IEnumerable<T> ToEnumerable(this Memory<T> memory) => new MemoryEnumerable<T>(memory);

internal class MemoryEnumerable<T> : IEnumerable<T>
{
    ReadOnlyMemory<T> _memory;

    public MemoryEnumerable(ReadOnlyMemory<T> memory) => _memory = memory;

    public IEnumerator<T> GetEnumerator() => new MemoryEnumerator<T>(_memory);

    IEnumerator IEnumerable.GetEnumerator() => new MemoryEnumerator<T>(_memory);
}

internal class MemoryEnumerator<T> : IEnumerator<T>
{
    ReadOnlyMemory<T> _memory;
    int _index;

    public MemoryEnumerator(ReadOnlyMemory<T> memory) => _memory = memory;

    public T Current => _memory.Span[_index];

    object IEnumerator.Current => _memory.Span[_index];

    public void Dispose()
    {
        throw new NotImplementedException();
    }

    public bool MoveNext()
    {
        _index++;
        return _index <_memory.Length;
    }

    public void Reset()
    {
        _index = 0;
    }
}

cc @eerhardt, @KrzysztofCwalina, @stephentoub, @jkotas, @terrajobst, @karelz, @ericstj

@svick
Copy link
Contributor

svick commented Oct 25, 2017

Would other collection interfaces make sense too? Like IReadOnlyList<T>?

@stephentoub
Copy link
Member

might result in a pit of failure as it could lead to users unintentionally using the enumerator to iterate over the data

How is Memory any more susceptible than IEnumerable on other types like List and T[]?

@eerhardt
Copy link
Member

Why can’t Memory{T} implement IEnumerable{T} directly? That way you wouldn’t need to call a method, nor alloc a throw away object.

@ahsonkhan
Copy link
Member Author

ahsonkhan commented Oct 25, 2017

How is Memory any more susceptible than IEnumerable on other types like List and T[]?

It is just as susceptible. The difference being we can change Memory<T>.

Why can’t Memory{T} implement IEnumerable{T} directly? That way you wouldn’t need to call a method, nor alloc a throw away object.

See Rationale section:

However, this would cause applications that reference System.Memory and Memory<T> for any scenario to be larger (when compiled AOT). Furthermore, if Memory<T> was an IEnumerable<T>, it might result in a pit of failure as it could lead to users unintentionally using the enumerator to iterate over the data rather than the more performant indexer on Memory.Span

@jkotas
Copy link
Member

jkotas commented Oct 25, 2017

Why can’t Memory{T} implement IEnumerable{T} directly? That way you wouldn’t need to call a method, nor alloc a throw away object.

You always need to alloc a throw away object to get IEnumerable out of memory. If Memory implemented IEnumerable directly, casting to IEnumerable would box.

@eerhardt
Copy link
Member

Oh right. I missed that Memory{T} is a struct.

@stephentoub
Copy link
Member

stephentoub commented Oct 25, 2017

Partial Implementation

This is an implementation detail, but we should use the same allocated object as both the enumerable and enumerator in the common case, as is done by the compiler with iterators and in LINQ... this could potentially even just be implemented with an iterator:

public static IEnumerable<T> ToEnumerable(this Memory<T> memory)
{
    for (int i = 0; i < memory.Length; i++) yield return memory.Span[i];
}

@stephentoub
Copy link
Member

stephentoub commented Oct 25, 2017

Span cannot implement IEnumerable since it is a stack-only, byref type, and casting it to an interface would cause boxing

Only tangentially related to this issue, have we considered giving Span an instance or extension GetEnumerator that returns a ref struct Enumerator? And/or have there been any discussions about enabling the compiler to support foreach'ing a span and generating optimal code for that? Just as I can foreach an array and end up with asm equivalent to walking through each element, it'd be nice to be able to do that with span as well, e.g.

foreach (T item in span)
{
    ...
}

being the same as:

for (int i = 0; i < span.Length; i++)
{
    T item = span[i];
    ...
}

EDIT: C# doesn't currently support extension GetEnumerators in foreach; either it would need to be an instance method, or the language would need to be updated to either special-case span or enable extension GetEnumerators.

cc: @VSadov, @MadsTorgersen

@KrzysztofCwalina
Copy link
Member

Language support for foreach would be great.

@terrajobst
Copy link
Member

terrajobst commented Nov 7, 2017

Video

Looks good as proposed. MemoryExtensions doesn't exist yet but SpanExtensions will be renamed to it.

@stephentoub
Copy link
Member

I opened dotnet/csharplang#1085 for foreach support for Span<T> from the language side of things. We still need to decide if we want to add GetEnumerator instance method to Span<T> and ReadOnlySpan<T>.

@karelz
Copy link
Member

karelz commented Nov 8, 2017

FYI: The API review discussion was recorded - see https://youtu.be/HnKmLJcEM74?t=391 (13 min duration)

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Nov 28, 2017

@ahsonkhan, should we close this issue now? We added the enumerator.

[EDIT] Editing correct @ahsonkhan by @karelz

@stephentoub
Copy link
Member

should we close this issue now? We added the enumerator.

We added one to span. I don't think we added one to memory.

@KrzysztofCwalina
Copy link
Member

KrzysztofCwalina commented Nov 28, 2017

I assume the one added to Memory would be a by ref struct, i.e. it would cache the span? If yes, I think it's fine to add it. Otherwise I think it's a perf trap.

@stephentoub
Copy link
Member

I would assume not, as you'd most likely want to use it with LINQ and other such consumers. And if you did want a byref one, you could instead enumerate .Span.

@VSadov
Copy link
Member

VSadov commented Nov 28, 2017

We tried the same route with ImmutableArray - add an adapter class and ToEnumerable method (or was it AsList ? ) - we did not like boxing of ImmutableArray.
Anyways people despised that so we ended up with IEnumerable anyways.

@ghost ghost self-assigned this Jan 2, 2018
@joshfree joshfree unassigned ghost Jan 3, 2018
@ianhays
Copy link
Contributor

ianhays commented Jan 11, 2018

Do we still want to add public static IEnumerable<T> ToEnumerable(this Memory<T> memory); even if we also add Memory<T>.GetEnumerator() a la dotnet/coreclr#14922?

@ianhays ianhays self-assigned this Jan 11, 2018
@benaadams
Copy link
Member

Memory<T> isn't getting .GetEnumerator() and IEnumerable<T> ToEnumerable(this Memory<T> memory) is a performance hole provided for compat (as you can't put a span in a IEnumerator), so is an explicit call. (As far as I understand)

@ianhays
Copy link
Contributor

ianhays commented Jan 11, 2018

Memory isn't getting .GetEnumerator()
as you can't put a span in a IEnumerator

isn't that exactly what we're already doing right here (though as Enumerator, not IEnumerator)? Or did you mean that you can't put a Memory into the Enumerator (which makes sense because it's not by-ref)?

@stephentoub
Copy link
Member

isn't that exactly what we're already doing right here

No, that's a ref struct, and it doesn't implement IEnumerator<T>, and that's for Span<T>, not Memory<T>. That's just implementing a pattern that's bindable to foreach by the C# compiler, no interfaces involved.

even if we also add Memory.GetEnumerator() a la dotnet/coreclr#14922?

We're not adding an IEnumerable<T> implementation to Memory<T>, as that likely ends up being more expensive than a ToEnumerable method. Consider a method that accepts an IEnumerable<T>. If Memory<T> (a struct) implemented IEnumerable<T>, passing the struct to that method would box it, resulting in an allocation; then when that method calls GetEnumerator and gets back an IEnumerator<T>, that'd be a second allocation. In contrast, with a ToEnumerable method, the IEnumerable<T> that's returned can also implement IEnumerator<T>, such that the instance you get back can be the sole allocation in the 99% case where the enumerable is enumerated once.

@ianhays
Copy link
Contributor

ianhays commented Jan 11, 2018

Thanks @stephentoub, I've got some better context now. Your comments in The API Review video on this topic helped too :)

@ianhays
Copy link
Contributor

ianhays commented Jan 18, 2018

There's pushback in dotnet/corefx#26284 about this being implemented as an extension method so I closed it so we can decide for sure what the API shape will look like and where the API should live (MemoryMarshal? Some new class?)

cc: @KrzysztofCwalina

ianhays referenced this issue in dotnet/corefx Jan 29, 2018
Consume Span moves
- [Move Span.DangerousCreate to MemoryMarshal.CreateSpan](#26139)
- [Move ReadOnlySpan.DangerousCreate to MemoryMarshal.CreateReadOnlySpan](#26139)
- [Move Span.NonPortableCast to MemoryMarshal.Cast](#26368)
- [Move ReadOnlySpan.NonPortableCast to MemoryMarshal.Cast](#26368)
- [Add ToString override to Span and ReadOnlySpan](#26295)
- [Add ToEnumerable function to MemoryMarshal that takes a Memory](#24854)
@msftgits msftgits transferred this issue from dotnet/corefx Jan 31, 2020
@msftgits msftgits added this to the 2.1.0 milestone Jan 31, 2020
@carlossanlop carlossanlop removed the good first issue Issue should be easy to implement, good for first-time contributors label Jan 21, 2021
@AqlaSolutions
Copy link

Any news?

@terrajobst
Copy link
Member

@GrabYourPitchforks are you driving this? If not, who should be?

@GrabYourPitchforks
Copy link
Member

@terrajobst I'm not currently driving this because it seems to be low-priority. The salient point in this issue that people really reacted to was Steve's comment at #23950 (comment), and we did indeed eventually get foreach support over Span<T> instances.

If the consensus is that we should add this API, I'm very much partial to my most recent comments starting at #23950 (comment), as I think those largely address the performance trap that people were concerned with.

@terrajobst
Copy link
Member

Sounds good.

@CodingMadness
Copy link

Hey guys,

so we can do: foreach on spans, and AFAIK .NET makes a promise to use Duck-Typing for such constructs, like GetPinnableReference() to be used in fixed(....) but why is the compiler not smart enough (.NET 6.0 ) to infer that Span is duck-typed to an IEnumerable without implementing it

Cause when an extension method takes a T, where T : IEnumerable>T>, it could use internal Duck-Typing to infer, the type actually does the same as it being an IEnumerable.

Would be really nice if .NET could make ducktyping a thing, i mean coding all these LINQ things will be quite ugly :D

@jkotas
Copy link
Member

jkotas commented Mar 17, 2022

why is the compiler not smart enough (.NET 6.0 ) to infer that Span is duck-typed to an IEnumerable without implementing it

C# compiler does infer that in .NET 6. foreach works on Span for several years. More discussion on this is captured in https://github.com/dotnet/csharplang/blob/main/meetings/2017/LDM-2017-12-04.md#foreach-for-span-and-readonlyspan . The discussion on duck-typing is more appropriate for https://github.com/dotnet/csharplang repo.

@CodingMadness
Copy link

C# compiler does infer that in .NET 6. foreach works on Span for several years.

I already said that, good Sir, namely below:
@jkotas

so we can do: foreach on spans

This is a good thing, its just, we cannot make use of the Enumerable-LINQ-Extension methods sadly, cause the Compiler still strictly wants the callee of these methods to be of type IEnumerable.

That is my Dilemma

@stephentoub
Copy link
Member

stephentoub commented Mar 17, 2022

cause the Compiler still strictly wants the callee of these methods to be of type IEnumerable.

The C# compiler couldn't fabricate a wrapper IEnumerable for the same reason Span doesn't implement IEnumerable: it's a ref struct that can't be boxed or otherwise copied to the heap.

@CodingMadness
Copy link

@stephentoub yea ok, wanted just to get sure on this! Thanks and have a good and healthy Week!

@Joe4evr
Copy link
Contributor

Joe4evr commented Mar 17, 2022

@Shpendicus For what it's worth: SpanLinq

@CodingMadness
Copy link

@Joe4evr i think they are on a really good track, I also saw a nuget package called: NoAlloc maybe you can try that one aswell 🗡️

@eiriktsarpalis eiriktsarpalis added api-ready-for-review API is ready for review, it is NOT ready for implementation and removed api-needs-work API needs work before it is approved, it is NOT ready for implementation labels Dec 6, 2022
@eiriktsarpalis
Copy link
Member

eiriktsarpalis commented Dec 6, 2022

Marking as ready-for-review in light of the discussion in #77209 (comment). It has been suggested that ReadOnlyMemory<T> implementing IEnumerable<T> directly is desirable, since it would avoid the need for a dedicated

IEqualityComparer<ReadOnlyMemory<T>> CreateMemoryComparer<T>(IEqualityComparer<T>)

method and we can specialize the already approved

IEqualityComparer<TEnumerable> CreateEnumerableComparer<TEnumerable, T>(IEqualityComparer<T>) where TEnumerable : IEnumerable<T>

method instead.

@bartonjs
Copy link
Member

bartonjs commented Jul 20, 2023

Video

This came up in API Review, and the main question that we had was "why is MemoryMarshal.ToEnumerable(ReadOnlyMemory<T>) not good enough?". We didn't seem to see any discussion related to that existing member.

@bartonjs bartonjs added api-needs-work API needs work before it is approved, it is NOT ready for implementation and removed api-ready-for-review API is ready for review, it is NOT ready for implementation labels Jul 20, 2023
@jhudsoncedaron
Copy link

@bartonjs : I just now looked into it. It looks like this issue is an example of discovery fail. Oof. I would never think to look in namespace interopservices for something like that.

It's also a pretty terrible implementation, performance-wise. I left behind a much faster one in a comment above.

@terrajobst
Copy link
Member

terrajobst commented Jul 20, 2023

I just now looked into it. It looks like this issue is an example of discovery fail. Oof. I would never think to look in namespace interopservices for something like that.

That's fair. I don't recall why we put it there but it seems to suggest the intent was to make it less discoverable.

It's also a pretty terrible implementation, performance-wise. I left behind a much faster one in a comment above.

I think this depends on what you want to achieve. If you just want to enumerate the contents in a foreach loop, then you shouldn't use this API at all. You should get the span once and foreach the span which the compiler will basically emit as for loop over the span and the JIT will probably be able to hoist bound checks outside the loop, which is as performant as you can get.

But if you need to pass the memory to an API that wants an IEnumerable<T> then you need to allocate a class. You could try to play tricks to avoid the allocation of the enumerator but that isn't necessarily safe if you want to allow multiple enumerations from the same instance. And I don't think the implementation can avoid requesting the span for each element, which isn't that cheap either. But I think all this overhead comes from the fact that the caller wants to go through IEnumerable<T>.

Come to think of it, I believe we didn't want an API like memory.ToEnumerable() because it seems like a performance trap.

@jhudsoncedaron
Copy link

"And I don't think the implementation can avoid requesting the span for each element"

Yes it can. Since it is inside System.Private.CoreLib it can unwrap the Memory, which is what I posted. Oh well.

@terrajobst
Copy link
Member

I can't seem to find your comment. Could you link it?

@DaZombieKiller
Copy link
Contributor

The APIs for unwrapping a Memory<T> are actually public (MemoryMarshal.TryGetArray, MemoryMarshal.TryGetString and MemoryMarshal.TryGetMemoryManager) so even if the method wasn't in the BCL it could do that.

The problem is when the Memory<T> isn't backed by a string or an array but rather a MemoryManager. The implementation could be made more efficient by using Pin, but as mentioned in API review it would most likely require a finalizer which would be non-ideal.

@terrajobst
Copy link
Member

terrajobst commented Jul 20, 2023

I see, you're suggesting an implementation for ToEnumerable() that can special case specific underlying types for a span (such as arrays and strings) and provide an implementation that wouldn't have to get the span for each element. I think that's fair and seems quite doable. For short spans, this might make it worse though, but we could measure that. For native memory or custom memory managers it would probably continue to be somewhat inefficient if we want some amount of safety. However, I can buy that these scenarios are more rare in practice as they most likely deal with byte buffers which one generally doesn't want to enumerate but parse.

Personally, I'm somewhat questioning how common of a problem memory->enumerable is in practice. To some extent, I think of these features as opposites.

@DaZombieKiller
Copy link
Contributor

DaZombieKiller commented Jul 20, 2023

We're essentially entering the space of "niche within niche" here, but if you want to allow as many cases to be efficient as possible, it may make sense to add a new virtual method to MemoryManager:

namespace System.Buffers;

public abstract partial class MemoryManager<T>
{
    public virtual IEnumerable<T> ToEnumerable()
    {
        // Default implementation using 'yield return'
    }
}

An example of the kind of code that might benefit from this is in the community toolkit. There are APIs in the toolkit that offer the equivalent of MemoryMarshal.Cast for Memory<T> instances, which hand back MemoryManager<T>-backed Memory<T>s that provide MemoryMarshal.Cast-ed Span<T>s. Having that extra virtual would allow those MemoryManagers (and any others like them) to avoid the performance penalty associated with the default yield return implementation.

Is that worth it for such a niche use case though? I'm skeptical.

@stephentoub
Copy link
Member

It's also a pretty terrible implementation, performance-wise.

#89274 I put up earlier fixes that.

@jhudsoncedaron
Copy link

@stephentoub Ah good that problem's going away.

@ionoy
Copy link

ionoy commented Dec 5, 2023

Is there a reason this wasn't exposed as an extension method?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-needs-work API needs work before it is approved, it is NOT ready for implementation area-System.Memory
Projects
None yet
Development

Successfully merging a pull request may close this issue.