-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: Make it easier to iterate through an ArraySegment #19543
Comments
What is the motivation for |
That's really weird it does already have an Enumerator ( Maybe to keep it private? |
@karelz As @benaadams just mentioned it allocates since it is a class and is passed around as an |
Since the current nested enumerator type isn't exposed, can we just replace it with a struct enumerator (implementing IEnumerable) and expose that directly? |
Both |
I think we may want to create a new type, actually. For reference, here is how the enumerator is currently implemented: ArraySegmentEnumerator private sealed class ArraySegmentEnumerator : IEnumerator<T>
{
private T[] _array;
private int _start;
private int _end;
private int _current;
internal ArraySegmentEnumerator(ArraySegment<T> arraySegment)
{
Contract.Requires(arraySegment.Array != null);
Contract.Requires(arraySegment.Offset >= 0);
Contract.Requires(arraySegment.Count >= 0);
Contract.Requires(arraySegment.Offset + arraySegment.Count <= arraySegment.Array.Length);
_array = arraySegment._array;
_start = arraySegment._offset;
_end = _start + arraySegment._count;
_current = _start - 1;
}
public bool MoveNext()
{
if (_current < _end)
{
_current++;
return (_current < _end);
}
return false;
}
public T Current
{
get
{
if (_current < _start) ThrowHelper.ThrowInvalidOperationException_InvalidOperation_EnumNotStarted();
if (_current >= _end) ThrowHelper.ThrowInvalidOperationException_InvalidOperation_EnumEnded();
return _array[_current];
}
}
object IEnumerator.Current
{
get
{
return Current;
}
}
void IEnumerator.Reset()
{
_current = _start - 1;
}
public void Dispose()
{
}
} We can make it faster by removing some redundant checks/fields (eliminating public struct Enumerator
{
private readonly T[] _array;
private readonly int _count;
private int _current;
internal Enumerator(ArraySegment<T> segment)
{
_array = segment.Array;
_count = segment.Count;
_current = _segment.Offset - 1;
}
public bool MoveNext() => _current++ < end;
public T Current => _array[_current];
} But this would break existing behavior. |
We should just remove the checks. Also note: |
Also https://msdn.microsoft.com/en-us/library/system.collections.ienumerator.reset(v=vs.110).aspx
|
The |
Need to additionally store |
Ah yes. |
I am not sold on exposing new APIs just because we can, unless there is real data showing it matters (perf wise in this case). Saving 1 boxing is not IMO worth it, unless there is real-world code (not a made up example) where the saving would make a noticeable difference. Or unless we believe it is a pattern we should have everywhere ... |
@karelz It is kind of a chicken-and-egg problem...
*every time the ArraySegment is iterated over. |
I don't think it is chicken-and-egg problem. I think it is premature optimization problem - i.e. optimization before we measure. The problem is that we could spend decades optimizing one boxing here and another instruction there and we may or may not make a noticeable difference in all that effort (and if we did, it would be by a chance/accident). If there are other reasons (e.g. it is general pattern to have Anyway, I am be wrong as well, so am fine presenting the option at API review. |
Regarding the enumerator, we should expose the internal enumerator as a struct Enumerator and have a public GetEnumerator return it. It's little work, it fixes a gaping hole in the API, and such boxing does add up. This is part of peanut butter effect across .NET... we've historically not cared about this or that little allocation because on its own it's not consequential, but when you add all of those up across all such APIs, they can be impactful. As for the indexer, we should add that, too... but it's already there as part of the IList explicit implementation: |
That sounds reasonable. I was afraid we would have to add yet another public inner enumerator class (I didn't realize the existing one is private). Exposing existing one follows general enumerator pattern on perf-sensitive classes. |
@karelz @stephentoub Please see https://github.com/dotnet/corefx/issues/14170#issuecomment-264604632 and the subsequent comments. If we expose the existing implementation, we will want to change behavior in some corner cases in |
I don't think we need to change the behavior as you describe above. Your simplified code without checks is moreover incorrect - it will allow misuse of the API (when people forget to call |
@karelz There is similar code in ImmutableArray's enumerator. In the real world 99% of use cases of this API are through foreach, and the compiler will always generate the correct code and put
|
That is an unrelated request. Even if didn't expose the existing implementation and instead exposed a new one and a new GetEnumerator method, it would still be a breaking change to have different behavior for that newly exposed enumerator, as recompiling code that targeted GetEnumerator which should from targeting the interface method to the instance method, and it would get the new behavior. |
Can you give an example? Since currently |
Hmm, I see, I think you're right. That said, I'm not convinced anything should change in the implementation, i.e. I'm not convinced it's worthwhile either taking a breaking change for existing usage of ArraySegment as IEnumerable or introducing yet another enumerator. Size-wise, the best you could do would be to get rid of one Int32 field (_start). And the proposed changes not only result in not properly validating and throwing for misuse, but can actually result in MoveNext returning true when it shouldn't, in that usage could end up wrapping around. I agree that it'd be rare to see resulting breaks, but what's the benefit of the change? The recommended path forward will be to use Span, not ArraySegment, so we're really only doing this to make existing usage better, and while I've not measured, I'd bet that exposing the Enumerator and making it a struct (eliminating the interface calls, making things more easily inlinable, avoiding the allocation when foreach'd, etc.) addresses >= 90% of the possible gain. If someone really cared about throughput, they wouldn't be using the enumerator at all, and instead would just use a for loop, looping over the array directly from offset to offset+count, which is going to be better than anything we can do in the struct enumerator. So we'd be taking a breaking change (or adding additional code) to account for maybe a 10% extra improvement, if that, for a scenario where if someone actually cared about throughput, they wouldn't have written that code anyway. Again, though, I'm guessing on the actual costs... someone should measure. I'm also not entirely against such a change, I just want to make sure if we make it it's for the right reasons. I doubt such a change would be ported back to desktop, making it yet another difference between desktop and core, something we strive to avoid unless there's a strong reason for it. |
API review: |
Updated proposal on the top. |
@karelz I am working on this. But I have 2 questions:
|
Good questions - @jamesqo @stephentoub @weshaggard can you guys advise here? |
Dispose should be public. Reset can be an explicit interface implementation. |
Thanks @stephentoub. Spec on the top updated. |
@stephentoub Thanks. I made Dispose public. |
@karelz Issue can do closed see dotnet/coreclr#8559 |
@AlexRadch There have to be tests added for the new API and it has to be exposed in corefx. |
@AlexRadch do you expect to have time to expose this? Otherwise it can't be used. |
No response for 1.5 months, unassigning the issue - it is again up for grabs, available for anyone to pick it up and finish it. Who's up for it? |
@karelz I'm exposing/adding tests for this along with another ArraySegment API. |
Thanks for heads up, assigned to you. |
Approved API shape
ArraySegment<T>.ArraySegmentEnumerator
to struct (see above).Value: Saves 1 boxing per enumerator:
Note: I know that
Span
is supposed to supplantArraySegment
, this was added way back in 2.0, etc. However, the indexer seems so trivial to add and could help increase readability of existing code.The text was updated successfully, but these errors were encountered: