-
Notifications
You must be signed in to change notification settings - Fork 4.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add Enumerable.SkipLast, TakeLast to get the last N elements of a sequence. #19431
Comments
The difference in outputs isn't too big a concern IMO, especially since variations in behaviour along similar lines already exist. I've wanted a The most memory-efficient approach to |
I don't think we would need to do any copying with a buffer of 4 elements, actually. We could just implement it as a circular buffer of length N that automatically overwrote the most stale items, if I understand correctly. You're right though that we may need to do some dynamic growth in case |
@JonHanna I don't understand, why would buffering only 4 elements require copying? The way I imagine the implementation, it would use
In code: public static IEnumerable<T> TakeLast<T>(this IEnumerable<T> source, int count)
{
var buffer = new List<T>();
int pos = 0;
foreach (var item in source)
{
if (buffer.Count < count)
{
// phase 1
buffer.Add(item);
}
else
{
// phase 2
buffer[pos] = item;
pos = (pos+1) % count;
}
}
for (int i = 0; i < buffer.Count; i++)
{
yield return buffer[pos];
pos = (pos+1) % count;
}
}
public static IEnumerable<T> SkipLast<T>(this IEnumerable<T> source, int count)
{
var buffer = new List<T>();
int pos = 0;
foreach (var item in source)
{
if (buffer.Count < count)
{
// phase 1
buffer.Add(item);
}
else
{
// phase 2
yield return buffer[pos];
buffer[pos] = item;
pos = (pos+1) % count;
}
}
} |
Yes, we could be circular. You are both correct. |
@karelz, maybe this could be marked ready-for-review? I don't think this would be a very controversial API; as I showed in the description there are many questions on StackOverflow this would solve. |
@VSadov @OmarTawfik are in the middle of Linq area triage (see areas), they will get to it ... |
@karelz 👍 Thanks for linking me to that document, I did not know about it. |
Docs discoverability and cleanup is on my list for December :) |
The API looks good as proposed. @stephentoub reminded us that |
I'd like to see Queryable keep its parity with Enumerable on any method that doesn't materialise a collection (since that drags things into memory anyway) or attach elements like Append. |
Right, I wasn't going to try typing this on my phone. I think we should have something like this: public static IQueryable<T> SkipLast<T>(this IQueryable<T> source, int count)
{
if (source == null)
{
throw Error.ArgumentNull(nameof(source));
}
EnumerableQuery eq = source as EnumerableQuery;
if (eq != null)
{
// Call optimised version for in-memory collections.
EnumerableQuery<T> eqt = source as EnumerableQuery<T>;
if (eqt != null)
{
return new EnumerableQuery<T>(eqt.AsEnumerable().SkipLast(count));
}
// EnumerableQuery of a different element type, passed covariantly. Call for the actual type.
return (IQueryable<T>) new Func<IQueryable<T>, int, IQueryable<T>>(SkipLast).GetMethodInfo()
.GetGenericMethodDefinition()
.MakeGenericMethod(source.ElementType)
.Invoke(null, new object[] {source, count});
}
// Create a query that does a SkipLast. QueryProviders may recongise the sequence and optimise further
return source.Reverse().Skip(count).Reverse();
}
public static IQueryable<T> TakeLast<T>(this IQueryable<T> source, int count)
{
if (source == null)
{
throw Error.ArgumentNull(nameof(source));
}
EnumerableQuery eq = source as EnumerableQuery;
if (eq != null)
{
// Call optimised version for in-memory collections.
EnumerableQuery<T> eqt = source as EnumerableQuery<T>;
if (eqt != null)
{
return new EnumerableQuery<T>(eqt.AsEnumerable().TakeLast(count));
}
// EnumerableQuery of a different element type, passed covariantly. Call for the actual type.
return (IQueryable<T>)new Func<IQueryable<T>, int, IQueryable<T>>(SkipLast).GetMethodInfo()
.GetGenericMethodDefinition()
.MakeGenericMethod(source.ElementType)
.Invoke(null, new object[] {source, count});
}
// Create a query that does a TakeLast. QueryProviders may recongise the sequence and optimise further
return source.Reverse().Take(count).Reverse();
} |
Do we have a static type for extension methods where we should put these? Of course: I'd like to wait on @VSadov @OmarTawfik to agree with the approach. |
@karelz they'd go into (Actually, we had one parity gap in that |
Let's make it part of this proposal then. It was just oversight we didn't make it part of the original proposal. Adding: public static class System.Linq.Queryable
{
// Lots of existing methods for parity with Enumerable
public static IQueryable<T> SkipLast<T>(this IQueryable<T> source, int count);
public static IQueryable<T> TakeLast<T>(this IQueryable<T> source, int count);
} |
The alternative implementation approach is to do much as we already do for queryable methods: public static IQueryable<TSource> SkipLast<TSource>(this IQueryable<TSource> source, int count)
{
if (source == null)
throw Error.ArgumentNull(nameof(source));
return source.Provider.CreateQuery<TSource>(
Expression.Call(
null,
// We'd cache this methodinfo, but for sake of discussion…
new Func<IQueryable<TSource>, int, IQueryable<TSource>>(Queryable.SkipLast).GetMethodInfo(),
source.Expression, Expression.Constant(count)
));
} And then leave it up to query providers to add support. It's obviously simpler, but I wonder how long it would take the providers to catch up. |
OK, I got offline approval on the Queyable part. Next step: We need someone to implement it. Any takers? |
They should really go in together, but if you make a start on the enumerable bit I can help with the queryable. The question is whether to take the first approach I mention (rewrites itself to a query current technology supports) or the second (describes itself like other queryable methods do and lets the provider deal with it). @VSadov @bartdesmet opinions? (Input people involved with Entity Framework and other query providers would be good too, but I've no idea who to ping on that). |
I would stick with the second traditional approach for a variety of reasons. First, I wouldn't force the need to write a decompiler to recognize e.g. Second, even if existing query providers work well with an expanded form of the query operator into a composition of existing operators, it may lead to very bad performance. This is a similar concern as the first reason above. To fix it, a query provider author would now be faced with recognizing the expanded pattern rather than simply start to recognize Third, if we expand the pattern today, there's no way back to do the simplest quoting possible at some point in the future, for it'd be a breaking change. I'd rather bite the bullet now and state that the set of query operators is open-ended and subject to future extension. In fact, most query providers only support a subset of query operators and throw for the ones they don't recognize. Fourth, I've done a fair amount of query provider stuff myself, but @divega from EF could chime in here as well. I think we should keep |
FWIW, Ix.NET (part of Rx.NET, providing the The |
I agree with @bartdesmet on sticking to the traditional approach for |
Or indeed they could do what the first suggestion does very quickly and have producing a more optimal implementation as a TODO. |
@bartdesmet Thanks for the links; the reference implementations were useful to look at. @JonHanna I made an implementation + some tests here, so you are free to continue where I left off. There is a caveat, though: I was unsure how to get the stuff under netcoreapp1.1 to build (I tried passing |
Grand so. Pull https://github.com/jamesqo/corefx/pull/3 into that branch and you've got queryable support done, including tests. The tests don't need to be as extensive as considering all the possible permutations of types of source and optimised cases is for the provider to worry about (which to start off is just you, since Enumerable is the basis of most of the work for linq-to-objects on queryable). I've ignored the versioning-related issues. I have put in a P2P reference in queryable/tests so that the tests can call into the current version of S.Linq to run (they pass!) but not done the reverse so that the consistency tests in S.Linq can pass. FYI, the actual methods here are quite simple, aside from a little bit of effort into reducing the cost of reflection. The methods do that reflection on themselves so that e.g. a call to |
@JonHanna Thanks for the explanation.
Does this mean you got the
Notice that all of the netcoreapp1.1 tests are not being included (even after I tried passing |
Background
Currently, Linq does not offer an API for getting the last N elements of a sequence. This has resulted in no less than 5 StackOverflow questions (with ~300 votes) popping up as the first result when you Google "skip last n elements of enumerable."
Proposal
We should add
SkipLast
andTakeLast
to Enumerable, which will skip/take the last N elements of the sequence, respectively. If there are less than N elements,SkipLast
will return an empty sequence, andTakeLast
will return a sequence with contents equivalent to the original enumerable.Remarks
In
SkipLast
instead of evaluating the whole thing at once, we will read in the firstcount
items into a circular buffer. Then we will interleave between yield returning the oldest element & overwriting that with a new one.In
TakeLast
we will still evaluate the whole sequence during the first iteration, but again maintain a circular buffer of lengthcount
and overwrite the oldest elements with newer ones.Original (incorrect) - Implementation remarks
These overloads will have subtly different semantics from
Skip
andTake
. Since for lazy enumerables we can't determine the count of the sequence in advance, we will have to capture it into an array during our first iteration (likeReverse
does today), then skip/take the last N items from that array between yields.As a result, these will have different outputs:
Another consequence is that this will lead to more allocations for lists/arrays. Perhaps this is worth it for added convenience, though.
The text was updated successfully, but these errors were encountered: