Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

C# Design Notes for Dec 7 and Dec 14, 2016 #16709

Closed
MadsTorgersen opened this issue Jan 24, 2017 · 27 comments
Closed

C# Design Notes for Dec 7 and Dec 14, 2016 #16709

MadsTorgersen opened this issue Jan 24, 2017 · 27 comments

Comments

@MadsTorgersen
Copy link
Contributor

C# Language Design Notes for Dec 7 and Dec 14, 2016

Agenda

  • Expression variables in query expressions
  • Irrefutable patterns and reachability
  • Do-while loop scope

Expression variables in query expressions

It seems desirable to allow expression variables in query clauses to be available in subsequent clauses:

from s in strings
where int.TryParse(s, out int i)
select i;

The idea is that the i introduced in the where clause becomes a sort of extra range variable for the query, and can be used in the select clause. It would even be definitely assigned there, because the compiler is smart enough to figure out that variables that are "definitely assigned when true" in a where clause expression would always be definitely assigned in subsequent clauses.

This is intriguing, but when you dig in it does raise a number of questions.

Translation

How would a query like that be translated into calls of existing query methods? In the example above we would need to split the where clause into a call to Select to compute both the boolean result and the expression variable i, then a call to Where to filter out those where the boolean result was false. For instance:

strings
	.Select(s => new { s, __w = int.TryParse(s, out int i) ? new { __c = true, i } : new { __c = false, i = default } })
	.Where(__p => __p.__w.__c);
	.Select(__p => __p.__c.i);

That first Select call is pretty unappetizing. We can do better, though, by using a trick: since we know that the failure case is about to be weeded out by the Where clause, why bother constructing an object for it? We can just null out the whole anonymous object to signify failure:

strings
	.Select(s => int.TryParse(s, out int i) ? new { s, i } : null)
	.Where(__p => __p != null)
	.Select(__p => __p.i);

Much better!

Other query clauses

We haven't really talked through how this would work for other kinds of query clauses. We'd have to go through them one by one and establish what the meaning is of expression variables in each expression in each kind of query clause. Can they all be propagated, and is it meaningful and reasonable to achieve?

Mutability

One thing to note is that range variables are immutable, while expression variables are mutable. We don't have the option of making expression variables mutable across a whole query, so we would need to make them immutable either:

  • everywhere, or
  • outside of the query clause that introduces them.

Having them be mutable inside their own query clause would allow for certain coding patterns such as:

from o in objects
where o is int i || (o is string s && int.TryParse(s, out i))
select i;

Here i is introduced and then mutated in the same query clause.

The above translation approaches would accommodate this "mutable then immutable" semantics if we choose to adopt it

Performance

With a naive query translation scheme, this could lead to a lot of hidden allocations even when an expression variable is not used in a subsequent clause. Today's query translation already has the problem of indiscriminately carrying forward all range variables, regardless of whether they are ever needed again. This feature would exacerbate that issue.

We could think in terms of language-mandated query optimizations, where the compiler is allowed to shed range variables once they are never referenced again, or at least if they are never referenced outside of their introducing clause.

Blocking off

We won't have time to do this feature in C# 7.0. If we want to leave ourselves room to do it in the future, we need to make sure that we don't allow expression variables in query clauses to mean something else today, that would contradict such a future.

The current semantics is that expression variables in query clauses are scoped to only the query clause. That means two subsequent query clauses can use the same name in expression variables, for instance. That is inconsistent with a future that allows those variables to share a scope across query clause boundaries.

Thus, if we want to allow this in the future we have to put in some restrictions in C# 7.0 to protect the design space. We have a couple of options:

  • Disallow expression variables altogether in query clauses
  • Require that all expression variables in a given query expression have different names

The former is a big hammer, but the latter requires a lot of work to get right - and seems at risk for not blocking off everything well enough.

Deconstruction

A related feature request is to allow deconstruction in the query clauses that introduce new range variables:

from (x, y) in points
let (dx, dy) = (x - x0, y - y0)
select Sqrt(dx * dx + dy * dy)

This, again, would simply introduce extra range variables into the query, and would sort of be equivalent to the tedious manual unpacking:

from __p1 in points
let x = __p1.Item1
let y = __p1.Item2
let __p2 = (x - x0, y - y0)
let dx = __p2.Item1
let dy = __p2.Item2
select Sqrt(dx * dx, dy * dy)

Except that we could do a much better job of translating the query into fewer calls:

points
	.Select(__p1 => new { x = __p1.Item1, y = __p1.Item2 })
	.Select(__p2 => new { dx = __p2.x - x0, dy = __p2.y - y0, * = __p2 }
	.Select(__p3 => Sqrt(__p3.dx * __p3.dx, __p3.dy * __p3.dy)

Conclusion

We will neither do expression variables nor deconstruction in C# 7.0, but would like to do them in the future. In order to protect our ability to do this, we will completely disallow expression variables inside query clauses, even though this is quite a big hammer.

Irrefutable patterns and reachability

We could be smarter about reachability around irrefutable patterns:

int i = 3
if (i is int j) {}
else { /* reachable? */ }

We could consider being smart, and realizing that the condition is always true, so the else clause is not reachable.

By comparison, though, in current C# we don't try to reason about non-constant conditions:

if (false && ...) {}
else { /* reachable today */ }

Conclusion

This is not worth making special affordances for. Let's stick with current semantics, and not introduce a new concept for "not constant, but we know it's true".

Do-while loop scope

In the previous meeting we decided that while loops should have narrow scope for expression variables introduced in their condition. We did not explicitly say that the same is the case for do-while, but it is.

@HaloFour
Copy link

HaloFour commented Jan 24, 2017

Previous comment of mine: #15619 (comment)

TL;DR let and where cover the primary use cases, in my opinion. from might pick up some more edge cases.

Why do the translations need to nest the range variables in some kind of container? That's an unnecessary allocation. Why not just project the result of the expression as an inutterable range variable? That eliminates an allocation and a projection.

strings
    .Select(s => new { s = s, <r>_w = int.TryParse(s, out int i), i = i })
    .Where(p => p.<r>_w)

That said I do like the idea of emitting an optimized form specifically for where, but even then I don't see why you'd need the additional projection. The anonymous type should just continue to work like any normal range projection.

strings
    .Select(s => int.TryParse(s, out int i) ? new { s, i } : null)
    .Where(p => p != null)

Hopefully LINQ will adopt tuples for projections in the future which would theoretically make that moot and take care of some of the performance concerns in general.

I can see where definite assignment would be a concern. My opinion would be that the introduced variable is mutable within the expression and that it is always projected to an immutable range variable. However, if the expression that declares the variable does not definitely assign it then it would be an error to reference it further in the query. I think that would afford enough flexibility to make it useful for the majority of scenarios with an escape hatch for more hairy situations.

@MgSam
Copy link

MgSam commented Jan 24, 2017

I disagree with all of the decisions here except for the do-while.

  • Expression variables in query expressions- why spend time designing special behavior for a rarely-used and underpowered feature? Query expressions are much less powerful than method chaining, as they have special syntax to only to a small subset of linq methods. And I rarely if ever see query syntax used outside of the language team. I'd argue, in retrospect, it was a mistake adding it to the language in the first place. Other languages have equally successful linq-like libraries without the dedicated syntax. It certainly doesn't make sense to waste time designing more special behavior for it. Don't throw good money at this dead feature.

  • Irrefutable patterns and reachability- the conclusion says its not worth doing but I certainly don't see why that would be the case. Lots of other compilers for other languages do this kind of thing- why is it wrong for C#? I could understand punting on the feature for now because of higher priorities, but the feature itself certainly seems worthwhile.

@jnm2
Copy link
Contributor

jnm2 commented Jan 24, 2017

  • Expression variables in query expressions- why spend time designing special behavior for a rarely-used and underpowered feature? Query expressions are much less powerful than method chaining, as they have special syntax to only to a small subset of linq methods. And I rarely if ever see query syntax used outside of the language team. I'd argue, in retrospect, it was a mistake adding it to the language in the first place. Other languages have equally successful linq-like libraries without the dedicated syntax. It certainly doesn't make sense to waste time designing more special behavior for it. Don't throw good money at this dead feature.

I must respectfully disagree with each statement in the paragraph. I have had very different experiences.

@HaloFour
Copy link

@MgSam

Reports of query expressions death are greatly exaggerated. My experience is quite the opposite. I see them used significantly more often than their method counterparts, especially by developers who are first embracing LINQ. Query expressions already have special syntax for projecting additional range variables, this fits right in with that. You're free to continue using the method syntax and you're free to manage and project any pattern variables or out declaration variables manually.

@iam3yal
Copy link

iam3yal commented Jan 24, 2017

@MgSam I think that it really depends on what you need and who you are, people that aren't familiar with functional programming and might not understand what projection even means may find query expressions very appealing.

It's easy to forget that not all programmers are engineers, not all of them have CS degree, not all of them come with a mathematical background, some people learnt IT and moved into programming so they lack quite a bit of math courses (at least in my uni there's a big difference) and finally there are these that are self-taught, there are plenty of web programmers that are using C# and are actually self-taught so for these people query expressions might be a good starting point.

Just a simple SelectMany:

int[] X = { 1, 2 };

var A = X.SelectMany(_ => X, (a, b) => new { a, b });

var B = from a in X
	from b in X
	select new { a, b };

To many people the latter version would make a lot more sense than the former.

@DavidArno
Copy link

@MgSam,

Like others here, I disagree with your views on linq. I also, to an extent, disagree with @eyalsk's "it's syntax for beginners" views. I have been using linq since it was first introduced and learned both syntax forms. I also (like to think) I have a reasonable understanding of functional programming. Yet, I use the query syntax by default, preferring its expressiveness when compared to method chaining. I only fallback to method chaining when the resultant query becomes too cumbersome, or when I need features not offered by the query syntax.

@DavidArno
Copy link

@MadsTorgersen,

Would you mind clarifying, the differences between the conclusion in #16640, regarding irrefutable patterns, that "This seems harmless, and will grow more useful over time. It's a small tweak that we should do." and these design notes.

My interpretation of this, is that it was felt a good idea in October, but upon revisiting the matter, it's been decided that it "is not worth making special affordances for" these irrefutable patterns and the language rules will remain as-is.

Have I got that right, or got myself muddled, please?

@jnm2
Copy link
Contributor

jnm2 commented Jan 24, 2017

Like @DavidArno, I prefer it for its visual aesthetics and readability. I'll type whichever is quicker and simpler and that's often query syntax.

@MgSam
Copy link

MgSam commented Jan 24, 2017

@Others

My point of view here is pragmatic, not ideological. It is undeniable that other languages do not have query syntax and get along fine without it. It is undeniable that query syntax has been gimped since the day it was created, as it can access only a small subset of the overall expressiveness of linq. It is undeniable that query syntax offers no additional functionality over the alternative syntax.

Whether you think it looks beautiful and elegant or not, and given these facts, how is it worth it spending time designing for it when there are far more valuable features the team could be working on? Is using expression variables in query expressions really a feature that the design team should be spending capital on?

This goes back to the argument I've been making for a long time on these forums- the prioritization of the C# team the past few years has been awful. They meander from feature to feature, seemingly without direction, and spend inordinate amounts of time working on minor features that will have almost no real world benefits.

@jnm2
Copy link
Contributor

jnm2 commented Jan 24, 2017

@MgSam Far more valuable than saving time and keystrokes? If saving time and keystrokes isn't pragmatic, I don't know what is! 😆
The argument that it offers no new functionality is a slippery slope. Pattern matching and null safe dereferencing don't logically offer anything that you couldn't already built more verbosely, either. Even async/await could be included in that criticism. All these things are amazing. And if it is idiomatic and beautiful at the same time, so much the better!
I wonder if it's one of those cases where your priorities differ significantly from the average, which is fine but should be understood as such? For example I never use dynamic and can't understand for the life of me why you'd spend time developing that out, but I know that viewpoint is peculiar to me and irrelevant to others.

@bondsbw
Copy link

bondsbw commented Jan 24, 2017

@MgSam I don't think anyone would disagree that query syntax can only represent a subset of the overall expressiveness of LINQ, but we could say the same about foreach and async for their counterparts.

I would like to see query syntax expanded, not deprecated. Fix the areas that are painful or otherwise come up short compared with method syntax (#100, #1938, #3486, #3571, #6877, #8221, #9273, #15638, etc.).

@svick
Copy link
Contributor

svick commented Jan 24, 2017

Today's query translation already has the problem of indiscriminately carrying forward all range variables, regardless of whether they are ever needed again. This feature would exacerbate that issue.

We could think in terms of language-mandated query optimizations, where the compiler is allowed to shed range variables once they are never referenced again, or at least if they are never referenced outside of their introducing clause.

Would having these optimizations actually require them being mandated by the language?

I believe the situation with the two forms of LINQ is:

  • For IEnumerable<T> (and other delegate-based providers), removing unused range variables could be very useful, and it would also not be visible to any user-written code.
  • For IQueryable<T> (and other Expression-based providers), removing unused range variables is not very useful, since the provider can make any such optimizations itself, but it would be visible to the provider code (read: a breaking change).

It seems to me that making the optimization for IEnumerable<T>, but not for IQueryable<T>, would give the most benefit, with relatively small cost (no need to change the spec, no breaking changes). Or is there some flaw in my argument?

@svick
Copy link
Contributor

svick commented Jan 24, 2017

@MgSam You might be interested to read this recent article analyzing LINQ usage in GitHub projects. My conclusion based on that data is that method syntax is used about twice as much as query syntax (compare the numbers for Select and select), but that still leaves query syntax for a significant number of queries.

@iam3yal
Copy link

iam3yal commented Jan 24, 2017

@DavidArno Where did I say that it's a syntax for beginners? 😆 ❤️

I said that it might be good starting point for people that don't understand or don't know how to use the alternative!

@DavidArno
Copy link

DavidArno commented Jan 24, 2017

@eyalsk,

My apologies: I misunderstood what you were saying therefore.

@chrisaut
Copy link

I think the analysis that method syntax is used more often than query syntax doesn't tell us much. The two are not equivalent. Certain things are only possible with method syntax (.ToList, First, etc), some things are just nicer/shorter in method syntax (eg. just do a .Where(x -> x) whereas with query syntax you always need the closing select. Other things are much nicer in query syntax, mostly multiple selects as shown above and certainly let statements, which are such a pain to do with method syntax. The point is it's not just user preference.

Personally I try to use one style for each query, but I do use both. Given the choice I would prefer to use query syntax as IMO it just reads nicer.

@CyrusNajmabadi
Copy link
Member

I still can't wrap my head around how to do 'joins' without using teh convenient query syntax. I find the method-form incredibly difficult to wrap my head around.

@iam3yal
Copy link

iam3yal commented Jan 25, 2017

@CyrusNajmabadi LINQ's motto should be when in doubt use/master SelectMany. 😆

Didn't check performance but these should be similar in terms of results:

firstNames.Join(
				lastNames,
				firstName => firstName.Key, 
				lastName => lastName.Key, 
				(firstName, lastName) => new { FirstName = firstName.Value, LastName = lastName.Value });

firstNames.SelectMany(firstName => lastNames.Where(lastName => firstName.Key == lastName.Key), (firstName, lastName) => new { FirstName = firstName.Value, LastName = lastName.Value });

@jnm2
Copy link
Contributor

jnm2 commented Jan 25, 2017

@eyalsk meh, the join (with the opportunity to switch to hash matching) will scale far better than the nested loops version. At least that's true for the join algos I've written.

@iam3yal
Copy link

iam3yal commented Jan 25, 2017

@jnm2 Yeah probably. :)

@bondsbw
Copy link

bondsbw commented Jan 26, 2017

@eyalsk I ran that exact query and consistently get around 2-4x better performance with Join than SelectMany.

@jnm2
Copy link
Contributor

jnm2 commented Jan 26, 2017

@bondsbw

@eyalsk I ran that exact query and consistently get around 2-4x better performance with Join than SelectMany.

To be clear there's nothing wrong with what @eyalsk suggested if that happens to be the easiest way for you to think about it. No point in wasting time optimizing for perf unless it ends up on a hot path. Personally, I find the .Join with key selectors to be quite intuitive and that (rather than perf) is the reason I use it.

@iam3yal
Copy link

iam3yal commented Jan 26, 2017

Guys, it was meant to be a joke, kinda but thanks for the elaboration. :)

@jnm2
Copy link
Contributor

jnm2 commented Jan 26, 2017

What's a joke
(source: am programmer)
kidding ;-)

@MadsTorgersen
Copy link
Contributor Author

@DavidArno regarding irrefutable patterns: the difference between the decision in #16640 and here is that the one in #16640 was about definite assignment, and was a small tweak to it. Furthermore it enables useful code that was otherwise prohibited.

The one here is about reachability, which currently very clearly only takes the value of constant expressions into account. We don't want to break with that principle just for this example of limited usefulness. Furthermore it would introduce a new diagnostic, not allow more code to work.

@DavidArno
Copy link

Thanks for the clarification, @MadsTorgersen. Makes sense to me now.

@jcouv
Copy link
Member

jcouv commented Jul 29, 2017

LDM notes for Dec 7 and Dec 14 2016 are available at https://github.com/dotnet/csharplang/blob/master/meetings/2016/LDM-2016-12-07-14.md
I'll close the present issue. Thanks

@jcouv jcouv closed this as completed Jul 29, 2017
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests