-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proposal: "while" as a LINQ query expression clause #15638
Comments
if Must be a C# thing as VB.net supports it. Dim q = (From i In N0()
Where i Mod 2 = 0
Take While i < 10
Select 2 * i).ToArray |
The former, since the |
I think the general solution here is #100 (comment), var q = from i in numbers
where i % 2 == 0
do TakeWhile(i < 10)
select i * 2; |
I'd like to see an analysis of C# source in the wild to see how many people actually use the query comprehension syntax. More and more I feel that adding it was a mistake- it tries to be like SQL and hide the how behind what its doing. But this is C#, not SQL. There is no query optimizer. Understanding the how is important. And the query language is and always will be inferior in functionality to what you can do with method calls. In the end it just causes more confusion when you inevitably can't do what you need to with it and then need to switch to method call syntax. |
@MgSam Yeah it's outright wrong. But it looks cool and it sells. I'd rather have a general syntax for monads instead of an incomplete sugar designed for a specific use case. And it's not even optimized for hell's sake. |
I know that it's used frequently in the projects I work with. I use both interchangeably, depending on which clauses I need. I fully admit that bridging query syntax and method syntax is awkward, and with the limitations of the query syntax in C# you have to do it more often than I think is necessary. I think if C# implemented a few additional clauses it could be much more pleasant. There is more potential for the query syntax to transform the results in useful ways than with the method syntax, such as lrojecting range variables from |
@MgSam I use it and like it when it is less verbose, aka when I'd have a
I don't think it's hiding anything. I've always understood it just as well as the method calls which is it syntactic sugar for. I really don't think you have a point here. If you're talking about EF and query providers, the how is just as hidden if you use method call syntax.
ReSharper makes it braindead easy going back and forth, and they can also be mixed. Again I'd like to see LINQ expanded. It would be awesome if it was generalized to interact with extensions like ToList. |
LINQ expressions are great. Readability goes way up when there is more than a single projection or a single filter. Compare |
We use it fairly extensively in Roslyn.
Not really no. Indeed, it tries to be very clear that it's just a rough syntactic translation. Outside of any clause that uses transparent-identifiers, you can just translate the linq-query-clause to the equivalent method-call syntax.
Right. And that's a good thing. Linq-queries enable a convenient way to right queries for any domain. There's no way we could optimize that as any optimization might be incorrect in a particular domain. That's why linq enables creating expression trees that you then pass to the domain specific API. That domain specific API can then optimize as it sees fit.
I agree. Which is why we made the 'how' of linq-queries super simple. How does this work: from x in y
where x > 21
select x * x Well, we just simply translate that to: y.Where(x => x > 21).Select(x => x * x) Very straightforward and intuitive.
'functionality' is not the only consideration when designing language features. Factors like 'ease of use', 'verbosity' and others come into play. For example, i really dislike doing any sorts of grouping/joining using the method syntax. I feel like it gets much less clear as to what's going on. |
What does this mean? To be clear, specific domains optimize queries when they can**. The C# compiler cannot optimize any of this because any such optimizations may be incorrect in the final domain where this code executes. -- ** Examples of optimizations:
public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
if (source == null) throw Error.ArgumentNull("source");
if (predicate == null) throw Error.ArgumentNull("predicate");
if (source is Iterator<TSource>) return ((Iterator<TSource>)source).Where(predicate);
if (source is TSource[]) return new WhereArrayIterator<TSource>((TSource[])source, predicate);
if (source is List<TSource>) return new WhereListIterator<TSource>((List<TSource>)source, predicate);
return new WhereEnumerableIterator<TSource>(source, predicate);
} i.e. it optimizes for arrays, lists, and it's own Iterator type. |
That means if it is supposed to help with 'ease of use' and 'verbosity' it should not come at the cost of efficiency. I read somewhere that performance was not one of goals of linq, but I don't know what that means. The thing that I like about Rust is that it doesn't matter if you use iterator combinators or just write the code yourself because they all compile away. |
I just don't see how 'ease of use' and 'verbosity' outweigh performance. Since it is a "language feature" I should be able to use it wherever makes sense and don't worry about performance. When I have to consider implementation details, it loses the value. |
Linq queries operators are no less efficient than using the linq methods. |
I don't know what this means. Does this mean you can't use interfaces? Since they impose a perf hit when you call an interface method? Does it mean you can't use a delegate, since an allocation occurs when it is created? Does it mean you can't use classes, because they have many extra bytes allocated for them in the runtime? Can you not use c# 7 patterns because they incur a type-check? |
Then you are not the target customer. C# provides many features for all sorts of developers. Developers who cannot take any sort of additional perf hit will likely find that some of our features are not for them. However, most developers are not in that boat. Most developers can take tiny perf hits in the majority of their code that isn't perf sensitive. And in the rarer perf sensitive areas, they tend to avoid those features. -- This is what we do in Roslyn itself. We use pretty much every C# feature under the sun (ok... maybe not stack-allocs...). However, in some parts of the code, we avoid certain features because they're so perf critical that even a single allocation would negatively affect things. Does that mean we can't allocate in our code? Of course not. Does it mean we can't produce a C# language feature unless it doesn't allocate? Of course not. We recognize that in the real world, non-perf-critical code outweighs perf critical code for nearly all developers. As such, we're fine with making features that help out most developers most of the time, even if it's not universally applicable to all code scenarios. |
Oh I was talking about the whole idea of query operators/methods, in general. Sorry if that was off-topic.
There is no alternative for those primitive constructs as the api design directly depend on them. But when we talk about 'ease of use' and 'verbosity' [of the code], there shouldn't be much difference between manually written loops and linq operator. It's not like if I care about 'ease of use' and 'verbosity' I don't care about performance. |
Simple. C# cares about many things. And 'performance' is only one of those things. We balance a whole host of factors when creating and implementing language features. -- The other issue is that 'performance' is exceptionally difficult to define. For some people the cost of a virtual method call has unacceptable performance. Does that mean we should have never done interfaces? For some people, an allocation would be unacceptable to their system. Does that mean we should have never done lambdas? For some people, a branch is unacceptable. Does that mean we should not have provided 'if' statements? At some point you have to decide that there is sufficient value to a feature, even if some part of the perf equation may make unsuitable for some user, or may make it unsuitable for many users in some code cases. In general, most of our interesting features fall into this bucket. There will be a user out there (perhaps you), who can't use it at all because of the perf impact of the feature. But that user is very rare. Less rare is the user that can use it most of the time, but has to avoid it in some critical paths. The Roslyn team itself falls into that group. And finally, there are the users that can use it all the time. We design the language around the latter two groups (which make up nearly the majority of our user base). We don't design it for the user that cannot accept any perf hit whatsoever. To do so would be far too limiting for what we want to do with this language. |
I'm going to assume you meant that you do care about performance. Sure. That's fine. So don't use interfaces in your own code. Don't use branching structures in your own code. Don't allocate in your own code. Don't invoke methods in your own code. etc. etc. etc. You're free to avoid all the parts of the language that end up impacting perf. But you are not our only customer. And your needs are not necessarily representative of the greater C# community that we're serving. Please don't assume that we should cater just to you at the expense of everyone else. |
Query syntax is probably the most distinctive C# feature. It makes programs easy to read and helps to make less bugs. Personally, LINQ was the main reason why I switched from Java to C#. Of course, if someone don't like queries for any reason, it's no problem to use plain methods, or any other alternative. |
I'm not! I'm thinking that compiler's job is to introduce human-friendly constructs to generate machine code as good as it could. Perhaps when there is an intermediate representation (i.e IL) this doesn't apply? |
It was off topic because we're having a conversation about the benefits of linq query syntax. And why it's beneficial to have it, even if you can accomplish the same thing with method-call syntax. --
I think you're conflating topics. Manually written loops and linq operators should not be compared. Manually written loops should be compared to the linq-API. The linq query operators are just syntactic niceties over the linq-API. Now, if we compare manually written loops to the linq-API, i would agree that it would be nice if the linq API was something with no overhead over manually written loops. And we could have possibly gotten that if the only purpose of the linq API was to replace manually written loops. But it wasn't. The purpose of Linq was to introduce a first class Query API. A generalized way to express querying of data in an expressive, lazy and composable manner. And, because of those goals, linq took on a few aspects that made it less fast than pure loop iterations. For example:
So, for people who have a loop, and need it to be as fast as possible, linq is the wrong choice. But for people who use loops, and don't need it to be as fast as possible, linq is fine. And for people who are not using loops, but are doing generalized queries, then linq can be great. The problem i see is that you think of Linq as usable for one task only, and you dismiss it because it's not the best thing for that task. But, like with many things in C#, we've designed the feature to be really good at many things, even if it's not the best at any one single thing. |
I have no idea what this means. What does "as good as it could" mean? The compiler tries to generate machine code that is as good as we can given what we know about the language. We optimize when the language allows for it (see vsadov's recent work on 'this' capturing: #14959). In the case of linq, the compiler, AFAIK, is optimizing things as best as it can given what it knows about the language, and given the machine code that it can compile into. If you feel like that's not the case. Please open a bug explaining where the compiler can generate better code, and why you feel like that optimization would be legal to do (i.e. it would depend only on things the compiler knows about for certain). Before doing so, you're welcome to give a brief sketch of what you think the compiler can do. And i can let you know if that seems reasonable or not.
IL is just machine-code. It's just machine-code for a specific type of machine (the .Net Runtime). We do what we can to optimize when we're 100% certain that such optimizations would not violate anything about the language, or the final runtime upon which the program will run. The last bit is important. C# doesn't run just on the Microsoft CLR. It runs on many different runtimes. And those runtimes do not all load the same libraries. Different runtimes load different libraries with different implementations of APIs. So the C# compiler cannot assume that a library will behave a certain way. That would break things if you then ran on a system where you had a different implementation of that library. Note: this is not hypothetical. We have many customers who are running on custom systems with custom libraries. Libraries where core CLR types (like System.Type, System.Int32, and System.String) are replaced with their own versions. So we cannot make assumptions about how those types are going to behave as they may literally not behave that way on those systems. |
I agree. |
So, i think having query operators for 'TakeWhile' makes sense. But i can't justify why we'd have TakeWhile but not SkipWhile. As such, i think we'd probably want
If we could only do one thing it would be to have |
Agreed on both counts. A general-purpose |
|
I'd like to add that it's perfectly possible to define your own set of query operators that are implemented even more performantly and still target them with both method syntax and query syntax. This is just a small part of what makes LINQ such a beautiful abstraction pattern. Furthermore, with respect to performance, I think no one has mentioned the usefulness of PLINQ which, if you're writing stateless code, can provide a massive, almost linear performance increase with the addition of a single method call. |
The problem with discussing "performance" is that it's incredible domain specific. Roslyn, for example, gets massive performance boosts in some areas, precisely be targetting more 'heavy' abstractions. It's amazing how much faster we can do things thanks to having an immutable API model. For one things, we pick up the ability to offload work to other threads 'for free'. Something that was staggeringly difficult back when we had a 'mutable/locking' model for our models. On the other hand, some areas of our code are so perf sensitive that even having an indirection, or range-check, or allocation, can end up with a noticeable impact on real-world runtimes**. The streaming-linq pattern is, in itself, a perf optimization for domains where being lazy means you can avoid most of your work. But it can be a perf hit in other domains. Performance is not a 'one size fits all' situation. And you're commonly having to deal with broad customer performance needs. -- ** This is why we try to have a lot of possible tools and patterns at our disposal. Depending on the specific constraints of any single area, devs can pick and choose the right set of options to get the perf they need. |
Closing this out. We're doing all language design now at dotnet/csharplang. If you're still interested in this idea let us know and we can migrate this over to a discussion in that repo. Thanks! |
Low effort/reward proposal:
I propose that the existing C# keyword
while
be added as a LINQ query expression clause. Unlikewhere
which filters out elements that do not meet the conditionwhile
would include all elements as long as the condition is met. Sincewhile
is already a keyword this would not break any existing syntax.The text was updated successfully, but these errors were encountered: