Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: "while" as a LINQ query expression clause #15638

Closed
HaloFour opened this issue Dec 2, 2016 · 29 comments
Closed

Proposal: "while" as a LINQ query expression clause #15638

HaloFour opened this issue Dec 2, 2016 · 29 comments

Comments

@HaloFour
Copy link

HaloFour commented Dec 2, 2016

Low effort/reward proposal:

I propose that the existing C# keyword while be added as a LINQ query expression clause. Unlike where which filters out elements that do not meet the condition while would include all elements as long as the condition is met. Since while is already a keyword this would not break any existing syntax.

var q = from i in numbers
where i % 2 == 0
while i < 10
select i * 2;

// translated into
var q = numbers
    .Where(i => i % 2 == 0)
    .TakeWhile(i => i < 10)
    .Select(i => i * 2);
@AdamSpeight2008
Copy link
Contributor

AdamSpeight2008 commented Dec 2, 2016

if numbers = Enumerable.Range(0, Int32.MaxValue)
Is resultant sequence 0 4 8 12 16 or 0 4 8

Must be a C# thing as VB.net supports it.

        Dim q = (From i In N0()
                 Where i Mod 2 = 0
                 Take While i < 10
                 Select 2 * i).ToArray

@HaloFour
Copy link
Author

HaloFour commented Dec 2, 2016

@AdamSpeight2008

The former, since the i * 2 follows the while clause, which should be the same behavior as your VB.NET example. And yes, I know that VB.NET has a bunch of other query clauses and it would be nice if C# might catch up. Having to go from query syntax to method syntax and back is pretty jarring.

@alrz
Copy link
Contributor

alrz commented Dec 3, 2016

I think the general solution here is #100 (comment),

var q = from i in numbers
        where i % 2 == 0
        do TakeWhile(i < 10)
        select i * 2;

@MgSam
Copy link

MgSam commented Dec 3, 2016

I'd like to see an analysis of C# source in the wild to see how many people actually use the query comprehension syntax.

More and more I feel that adding it was a mistake- it tries to be like SQL and hide the how behind what its doing. But this is C#, not SQL. There is no query optimizer. Understanding the how is important. And the query language is and always will be inferior in functionality to what you can do with method calls.

In the end it just causes more confusion when you inevitably can't do what you need to with it and then need to switch to method call syntax.

@alrz
Copy link
Contributor

alrz commented Dec 3, 2016

@MgSam Yeah it's outright wrong. But it looks cool and it sells. I'd rather have a general syntax for monads instead of an incomplete sugar designed for a specific use case. And it's not even optimized for hell's sake.

@HaloFour
Copy link
Author

HaloFour commented Dec 3, 2016

I know that it's used frequently in the projects I work with. I use both interchangeably, depending on which clauses I need. I fully admit that bridging query syntax and method syntax is awkward, and with the limitations of the query syntax in C# you have to do it more often than I think is necessary. I think if C# implemented a few additional clauses it could be much more pleasant. There is more potential for the query syntax to transform the results in useful ways than with the method syntax, such as lrojecting range variables from out declarations and variable patterns, which I seriously hope happens.

@jnm2
Copy link
Contributor

jnm2 commented Dec 3, 2016

@MgSam I use it and like it when it is less verbose, aka when I'd have a .Select(...) anyhow. I wish it was more flexible so I could use it more.

More and more I feel that adding it was a mistake- it tries to be like SQL and hide the how behind what its doing. But this is C#, not SQL. There is no query optimizer. Understanding the how is important.

I don't think it's hiding anything. I've always understood it just as well as the method calls which is it syntactic sugar for. I really don't think you have a point here. If you're talking about EF and query providers, the how is just as hidden if you use method call syntax.

And the query language is and always will be inferior in functionality to what you can do with method calls. In the end it just causes more confusion when you inevitably can't do what you need to with it and then need to switch to method call syntax.

ReSharper makes it braindead easy going back and forth, and they can also be mixed. Again I'd like to see LINQ expanded. It would be awesome if it was generalized to interact with extensions like ToList.

@aluanhaddad
Copy link

LINQ expressions are great. Readability goes way up when there is more than a single projection or a single filter. Compare GroupBy, Join, GroupJoin, OrderBy, and OrderByDescending. These are all much more readable using query syntax.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Dec 4, 2016

I'd like to see an analysis of C# source in the wild to see how many people actually use the query comprehension syntax.

We use it fairly extensively in Roslyn.

More and more I feel that adding it was a mistake- it tries to be like SQL and hide the how behind what its doing.

Not really no. Indeed, it tries to be very clear that it's just a rough syntactic translation. Outside of any clause that uses transparent-identifiers, you can just translate the linq-query-clause to the equivalent method-call syntax.

But this is C#, not SQL. There is no query optimizer.

Right. And that's a good thing. Linq-queries enable a convenient way to right queries for any domain. There's no way we could optimize that as any optimization might be incorrect in a particular domain. That's why linq enables creating expression trees that you then pass to the domain specific API. That domain specific API can then optimize as it sees fit.

Understanding the how is important.

I agree. Which is why we made the 'how' of linq-queries super simple. How does this work:

from x in y
where x > 21
select x * x

Well, we just simply translate that to:

y.Where(x => x > 21).Select(x => x * x)

Very straightforward and intuitive.

And the query language is and always will be inferior in functionality to what you can do with method calls.

'functionality' is not the only consideration when designing language features. Factors like 'ease of use', 'verbosity' and others come into play. For example, i really dislike doing any sorts of grouping/joining using the method syntax. I feel like it gets much less clear as to what's going on.

@CyrusNajmabadi
Copy link
Member

And it's not even optimized for hell's sake.

What does this mean? To be clear, specific domains optimize queries when they can**. The C# compiler cannot optimize any of this because any such optimizations may be incorrect in the final domain where this code executes.

--

** Examples of optimizations:

  1. Any DB back end will absolutely optimize the expression trees that are generated and passed to it.
  2. Even linq-to-in-memory-objects optimizes some cases. See https://referencesource.microsoft.com/#System.Core/System/Linq/Enumerable.cs . For example, .Where detects three cases where it produces a lower overhead enumerable:
        public static IEnumerable<TSource> Where<TSource>(this IEnumerable<TSource> source, Func<TSource, bool> predicate) {
            if (source == null) throw Error.ArgumentNull("source");
            if (predicate == null) throw Error.ArgumentNull("predicate");
            if (source is Iterator<TSource>) return ((Iterator<TSource>)source).Where(predicate);
            if (source is TSource[]) return new WhereArrayIterator<TSource>((TSource[])source, predicate);
            if (source is List<TSource>) return new WhereListIterator<TSource>((List<TSource>)source, predicate);
            return new WhereEnumerableIterator<TSource>(source, predicate);
        }

i.e. it optimizes for arrays, lists, and it's own Iterator type.

@alrz
Copy link
Contributor

alrz commented Dec 4, 2016

@CyrusNajmabadi

That means if it is supposed to help with 'ease of use' and 'verbosity' it should not come at the cost of efficiency. I read somewhere that performance was not one of goals of linq, but I don't know what that means. The thing that I like about Rust is that it doesn't matter if you use iterator combinators or just write the code yourself because they all compile away.

@alrz
Copy link
Contributor

alrz commented Dec 4, 2016

I just don't see how 'ease of use' and 'verbosity' outweigh performance. Since it is a "language feature" I should be able to use it wherever makes sense and don't worry about performance. When I have to consider implementation details, it loses the value.

@CyrusNajmabadi
Copy link
Member

@CyrusNajmabadi That means if it is supposed to help with 'ease of use' and 'verbosity' it should not come at the cost of efficiency.

Linq queries operators are no less efficient than using the linq methods.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Dec 4, 2016

Since it is a "language feature" I should be able to use it wherever makes sense and don't worry about performance.

I don't know what this means. Does this mean you can't use interfaces? Since they impose a perf hit when you call an interface method? Does it mean you can't use a delegate, since an allocation occurs when it is created? Does it mean you can't use classes, because they have many extra bytes allocated for them in the runtime? Can you not use c# 7 patterns because they incur a type-check?

@CyrusNajmabadi
Copy link
Member

When I have to consider implementation details, it loses the value.

Then you are not the target customer. C# provides many features for all sorts of developers. Developers who cannot take any sort of additional perf hit will likely find that some of our features are not for them. However, most developers are not in that boat. Most developers can take tiny perf hits in the majority of their code that isn't perf sensitive. And in the rarer perf sensitive areas, they tend to avoid those features.

--

This is what we do in Roslyn itself. We use pretty much every C# feature under the sun (ok... maybe not stack-allocs...). However, in some parts of the code, we avoid certain features because they're so perf critical that even a single allocation would negatively affect things.

Does that mean we can't allocate in our code? Of course not. Does it mean we can't produce a C# language feature unless it doesn't allocate? Of course not. We recognize that in the real world, non-perf-critical code outweighs perf critical code for nearly all developers. As such, we're fine with making features that help out most developers most of the time, even if it's not universally applicable to all code scenarios.

@alrz
Copy link
Contributor

alrz commented Dec 4, 2016

Linq queries operators are no less efficient than using the linq methods.

Oh I was talking about the whole idea of query operators/methods, in general. Sorry if that was off-topic.

I don't know what this means. Does this mean you can't use interfaces? Since they impose a perf hit when you call an interface method? Does it mean you can't use a delegate, since an allocation occurs when it is created? Does it mean you can't use classes, because they have many extra bytes allocated for them in the runtime?

There is no alternative for those primitive constructs as the api design directly depend on them. But when we talk about 'ease of use' and 'verbosity' [of the code], there shouldn't be much difference between manually written loops and linq operator. It's not like if I care about 'ease of use' and 'verbosity' I don't care about performance.

@CyrusNajmabadi
Copy link
Member

I just don't see how 'ease of use' and 'verbosity' outweigh performance.

Simple. C# cares about many things. And 'performance' is only one of those things. We balance a whole host of factors when creating and implementing language features.

--

The other issue is that 'performance' is exceptionally difficult to define. For some people the cost of a virtual method call has unacceptable performance. Does that mean we should have never done interfaces? For some people, an allocation would be unacceptable to their system. Does that mean we should have never done lambdas? For some people, a branch is unacceptable. Does that mean we should not have provided 'if' statements?

At some point you have to decide that there is sufficient value to a feature, even if some part of the perf equation may make unsuitable for some user, or may make it unsuitable for many users in some code cases.

In general, most of our interesting features fall into this bucket. There will be a user out there (perhaps you), who can't use it at all because of the perf impact of the feature. But that user is very rare. Less rare is the user that can use it most of the time, but has to avoid it in some critical paths. The Roslyn team itself falls into that group. And finally, there are the users that can use it all the time.

We design the language around the latter two groups (which make up nearly the majority of our user base). We don't design it for the user that cannot accept any perf hit whatsoever. To do so would be far too limiting for what we want to do with this language.

@CyrusNajmabadi
Copy link
Member

It's not like if I care about 'ease of use' and 'verbosity' I don't care about performance.

I'm going to assume you meant that you do care about performance.

Sure. That's fine. So don't use interfaces in your own code. Don't use branching structures in your own code. Don't allocate in your own code. Don't invoke methods in your own code. etc. etc. etc. You're free to avoid all the parts of the language that end up impacting perf.

But you are not our only customer. And your needs are not necessarily representative of the greater C# community that we're serving. Please don't assume that we should cater just to you at the expense of everyone else.

@gordanr
Copy link

gordanr commented Dec 4, 2016

Query syntax is probably the most distinctive C# feature. It makes programs easy to read and helps to make less bugs. Personally, LINQ was the main reason why I switched from Java to C#. Of course, if someone don't like queries for any reason, it's no problem to use plain methods, or any other alternative.

@alrz
Copy link
Contributor

alrz commented Dec 4, 2016

Please don't assume that we should cater just to you at the expense of everyone else.

I'm not! I'm thinking that compiler's job is to introduce human-friendly constructs to generate machine code as good as it could. Perhaps when there is an intermediate representation (i.e IL) this doesn't apply?

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Dec 4, 2016

Linq queries operators are no less efficient than using the linq methods.
Oh I was talking about the whole idea of query operators/methods, in general. Sorry if that was off-topic.

It was off topic because we're having a conversation about the benefits of linq query syntax. And why it's beneficial to have it, even if you can accomplish the same thing with method-call syntax.

--

there shouldn't be much difference between manually written loops and linq operator

I think you're conflating topics. Manually written loops and linq operators should not be compared. Manually written loops should be compared to the linq-API. The linq query operators are just syntactic niceties over the linq-API.

Now, if we compare manually written loops to the linq-API, i would agree that it would be nice if the linq API was something with no overhead over manually written loops. And we could have possibly gotten that if the only purpose of the linq API was to replace manually written loops. But it wasn't. The purpose of Linq was to introduce a first class Query API. A generalized way to express querying of data in an expressive, lazy and composable manner.

And, because of those goals, linq took on a few aspects that made it less fast than pure loop iterations. For example:

  1. To be usable in many domains, it needed to be interface based. So now you take the hit of interface calls.
  2. To be lazy, it needed to be able to capture. So now you get allocations.
  3. To be generalized, it needed to be introspectable so that different domains could consume it.

So, for people who have a loop, and need it to be as fast as possible, linq is the wrong choice. But for people who use loops, and don't need it to be as fast as possible, linq is fine. And for people who are not using loops, but are doing generalized queries, then linq can be great.

The problem i see is that you think of Linq as usable for one task only, and you dismiss it because it's not the best thing for that task. But, like with many things in C#, we've designed the feature to be really good at many things, even if it's not the best at any one single thing.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Dec 4, 2016

I'm not! I'm thinking that compiler's job is to introduce human-friendly constructs to generate machine code as good as it could.

I have no idea what this means. What does "as good as it could" mean?

The compiler tries to generate machine code that is as good as we can given what we know about the language. We optimize when the language allows for it (see vsadov's recent work on 'this' capturing: #14959).

In the case of linq, the compiler, AFAIK, is optimizing things as best as it can given what it knows about the language, and given the machine code that it can compile into. If you feel like that's not the case. Please open a bug explaining where the compiler can generate better code, and why you feel like that optimization would be legal to do (i.e. it would depend only on things the compiler knows about for certain). Before doing so, you're welcome to give a brief sketch of what you think the compiler can do. And i can let you know if that seems reasonable or not.

Perhaps when there is an intermediate representation (i.e IL) this doesn't apply?

IL is just machine-code. It's just machine-code for a specific type of machine (the .Net Runtime). We do what we can to optimize when we're 100% certain that such optimizations would not violate anything about the language, or the final runtime upon which the program will run.

The last bit is important. C# doesn't run just on the Microsoft CLR. It runs on many different runtimes. And those runtimes do not all load the same libraries. Different runtimes load different libraries with different implementations of APIs. So the C# compiler cannot assume that a library will behave a certain way. That would break things if you then ran on a system where you had a different implementation of that library.

Note: this is not hypothetical. We have many customers who are running on custom systems with custom libraries. Libraries where core CLR types (like System.Type, System.Int32, and System.String) are replaced with their own versions. So we cannot make assumptions about how those types are going to behave as they may literally not behave that way on those systems.

@gordanr
Copy link

gordanr commented Dec 4, 2016

So, for people who have a loop, and need it to be as fast as possible, linq is the wrong choice. But for people who use loops, and don't need it to be as fast as possible, linq is fine. And for people who are not using loops, but are doing generalized queries, then linq can be great.

I agree.
Reagarding this proposal, 'while' whould be used in a loop context. I am not completely against 'while' as a replacement for TakeWhile method, but I am not sure If I would use that. For my sense of LINQ, it is too much loop-centric.
Probably, there are other more useful LINQ extensions, i.e. materialization (ToList).

@CyrusNajmabadi
Copy link
Member

So, i think having query operators for 'TakeWhile' makes sense. But i can't justify why we'd have TakeWhile but not SkipWhile. As such, i think we'd probably want take while <expr> and skip while <expr>.

Probably, there are other more useful LINQ extensions, i.e. materialization (ToList).

If we could only do one thing it would be to have do as a linq operator. But i feel like adding more query support to these operators seems totally reasonable.

@HaloFour
Copy link
Author

HaloFour commented Dec 4, 2016

Agreed on both counts. A general-purpose do operator would scratch the itch and work for both intermediate and terminal operations, including custom extension methods. Having other clauses just makes LINQ a little more pleasing to consume. In either case the goal would be avoiding the requirement to wrap their expression in parentheses or assign to a variable so that they can use the extension methods not mapped to linq query clauses.

@paulomorgado
Copy link

do or apply 😄

@aluanhaddad
Copy link

aluanhaddad commented Dec 5, 2016

I'd like to add that it's perfectly possible to define your own set of query operators that are implemented even more performantly and still target them with both method syntax and query syntax. This is just a small part of what makes LINQ such a beautiful abstraction pattern. Furthermore, with respect to performance, I think no one has mentioned the usefulness of PLINQ which, if you're writing stateless code, can provide a massive, almost linear performance increase with the addition of a single method call.

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Dec 5, 2016

The problem with discussing "performance" is that it's incredible domain specific. Roslyn, for example, gets massive performance boosts in some areas, precisely be targetting more 'heavy' abstractions. It's amazing how much faster we can do things thanks to having an immutable API model. For one things, we pick up the ability to offload work to other threads 'for free'. Something that was staggeringly difficult back when we had a 'mutable/locking' model for our models. On the other hand, some areas of our code are so perf sensitive that even having an indirection, or range-check, or allocation, can end up with a noticeable impact on real-world runtimes**.

The streaming-linq pattern is, in itself, a perf optimization for domains where being lazy means you can avoid most of your work. But it can be a perf hit in other domains.

Performance is not a 'one size fits all' situation. And you're commonly having to deal with broad customer performance needs.

--

** This is why we try to have a lot of possible tools and patterns at our disposal. Depending on the specific constraints of any single area, devs can pick and choose the right set of options to get the perf they need.

@CyrusNajmabadi
Copy link
Member

Closing this out. We're doing all language design now at dotnet/csharplang. If you're still interested in this idea let us know and we can migrate this over to a discussion in that repo. Thanks!

@CyrusNajmabadi CyrusNajmabadi closed this as not planned Won't fix, can't repro, duplicate, stale Nov 8, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

10 participants