Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Collection expressions: inline collections in spreads and foreach #7864

Open
wants to merge 17 commits into
base: main
Choose a base branch
from
124 changes: 124 additions & 0 deletions proposals/collection-expressions-inline.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,124 @@
# Collection expressions: inline collections

## Summary

Support collection expressions *inline* in expression contexts where the *collection type* is not observable.

## Motivation

Inline collection expressions could be used with spreads to allow adding elements *conditionally* to the containing collection:
```csharp
int[] items = [x, y, .. b ? [z] : []];
```

Inline collection expressions could be used directly in `foreach`:
```csharp
foreach (var b in [false, true]) { }
```
In these cases, the *collection type* of the inline collection expression is unspecified. The choice of how or whether to instantiate the collection is left to the compiler.

## Detailed design

### Conversions

To support conversions for inline collection expressions, an abstract *collection_type* is introduced.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

abstract is a bit ambiguous. I would say anonymous.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Replaced "abstract" with "compile time".


A *collection_type* represents an enumerable type with a specific element type and an *unspecified* collection type.
A *collection_type* with element type `E` is referred to here as *`col<E>`*.

A *collection_type* exists at compile time only; a *collection_type* cannot be referenced in source or metadata.

The [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of a *collection_type* `col<E>` is `E`.

An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
An implicit *collection_type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.
An implicit *collection type conversion* exists from a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` to the *collection_type* `col<Tₑ>`.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do you want to refer to iteration type? or should we say element type? (just thinking upon the last few ldm talks).


The collection expression [*conversions*](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-12.0/collection-expressions.md#conversions) section is updated as follows:

> An implicit *collection expression conversion* exists from a collection expression to the following types:
> * **A *collection_type* `col<T>`**
> * A single dimensional *array type* `T[]`
> * ...
>
> The implicit *collection expression conversion* exists if the type has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` where for each *element* `Eᵢ` in the collection expression:
> * If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `Tₑ`.
> * If `Eᵢ` is a *spread element* `..Sᵢ`, there is an implicit conversion **from `Sᵢ` to the *collection_type* `col<Tₑ>`**.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice!


### Construction

The elements of a nested collection expression within a spread can be added to the containing collection instance directly, without instantiating an intermediate collection.

The elements of the nested collection expression are evaluated in order within the nested collection. There is no implied evaluation order between the elements of the nested collection and other elements within the containing collection expression.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this seems wrong to me. there should def be an implied evaluation order. if i have

[a, b, .. c ? [d] : [e], f, g]

i would expect that d and e would be evaluated before f and g.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to:

The elements of a collection expression within a spread are evaluated in order as if the spread elements were declared in the containing collection expression directly.


### Type inference

No changes are made for [*type inference*](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-12.0/collection-expressions.md#type-inference).
Since *type inference* from a spread element relies on the *iteration type* of the spread element expression, and since collection expressions do not have a *type* or an *iteration type*, there is no type inference from a collection expression nested within a spread element.

The relevant part of the [*type inference*](https://github.com/dotnet/csharplang/blob/main/proposals/csharp-12.0/collection-expressions.md#type-inference) section is included here for reference:

> An *input type inference* is made *from* an expression `E` *to* a type `T` in the following way:
>
> * If `E` is a *collection expression* with elements `Eᵢ`, and `T` is a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` or `T` is a *nullable value type* `T0?` and `T0` has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ`, then for each `Eᵢ`:
> * If `Eᵢ` is an *expression element*, then an *input type inference* is made *from* `Eᵢ` *to* `Tₑ`.
> * If `Eᵢ` is a *spread element* `..Sᵢ`, then a [*lower-bound inference*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#116310-lower-bound-inferences) is made *from* the [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of `Sᵢ` *to* `Tₑ`.
> * *[existing rules from first phase]* ...

### Foreach

The collection in a `foreach` statement may be a *collection expression*.

If the `foreach` statement has an *explicitly typed iteration variable* of type `Tₑ`, the compiler verifies the following for each element `Eᵢ` in the collection expression and reports an error otherwise:
* If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `Tₑ`.
* If `Eᵢ` is a *spread element* `..Sᵢ`, there is an implicit conversion from `Sᵢ` to the *collection_type* `col<Tₑ>`.

If the `foreach` statement has an *implicitly typed iteration variable*, the type of the *iteration variable* is the [*best common type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#116315-finding-the-best-common-type-of-a-set-of-expressions) of the types of the elements. If there is no *best common type*, an error is reported.

For each element `Eᵢ` in the collection expression, the type if any contributing to the *best common type* is the following:
* If `Eᵢ` is an *expression element*, the type of `Eᵢ`.
* If `Eᵢ` is a *spread element* `..Sᵢ`, the [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of `Sᵢ`.

Since collection expressions do not have a *type* or an *iteration type*, the *best common type* does not consider elements from nested collection expressions.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

pity. mght be something to reconsider later. for example:

[a, b, .. c ? [d] : [e] f, g]. I could imaginet he type is the bct of a, c, d, e, f, g


For a collection expression used as the collection in a `foreach` statement, the compiler may use any conforming representation for the collection instance, including eliding the collection.

The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler *may* evaluate subsequent elements before executing the loop body for preceding elements.
Copy link

@KevinCathcart KevinCathcart Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

Is that really something that should be left unspecified? The compiler cannot easilly change it later, as that would be an observable breaking change, and one that feels pretty likely to break real programs, whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.

Edit: Actually an even better example of a case where users would absolutely notice a change if one is made later is if some element throws an exception when evaluated. That is a case where it could make a really big difference to users as to whether the exception occurs before any loop iterations, or not.

I'd honestly expect all the elements to be evaluated first before any loop executions, simply because that it what would happen if the expression were converted to literally any runtime type.

The only sensible implementation strategies I can even think of that would allow for delayed evaluation would be unrolling the loop body (even if an unusual unrolling like: loop and process these initial literal elements, then loop and process these elements from a spread), or the creation of a generator that yields the elements one by one, and then iterating over that generator's output.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that really something that should be left unspecified?

Yes. It allows the compiler great leeway in the impl strategy for converting the expressions to a collection.

The compiler cannot easilly change it later, as that would be an observable breaking change

It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.

and one that feels pretty likely to break real programs,

I would definitely hope not.

whenever somebody does foreach (Foo foo in [myFoo, ..someFunctionWithSideEffects()]), especially if someFunctionWithSideEffects might mutate myFoo.

NOte: it's fine for someFunctionWithSideEffects to have side effects. Those will happen after myFoo is evaluated, and before any other elements are evaluated. What is unspecified is what it means is if the iteration of that evaluated value itself mutates. That would be a pattern we consider too 'out there', and we're willing to say that if your iterators do this, then collection expressions are not a good fit for you. We want to optimzie for the 99.99% case, where iteration order will not have an impact.

Note: to reiterate. Evaluation-order of the elements is still guaranteed to be left-to-right, just as it always has been in C#. So side effects from prior element evaluations will be seen by later element evaluations.

I'd honestly expect all the elements to be evaluated first before any loop executions

That would require stack space for all their values. That would def be undesirable in many cases. Simply imagine a collection expr with 1000 elements. If we require all that evaluation up front, we need to store all those values somewhere (like teh stack) then copy them into the dest. However, if we give the compiler freedom, it can do something like use a single space on the stack, evaluate each element it it, add to the dest, and the move on to the next (reusing that single space).

Copy link

@KevinCathcart KevinCathcart Feb 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and one that feels pretty likely to break real programs,
I would definitely hope not.

A better example is if evaluation of any element throws. I can predict with high certainty that users will notice and care about the difference between processing the loop body for the first handful elements and then throwing vs throwing before processing any elements. Imagine if say the loop body is writing to a file and the 15th element throws when evaluated. Pretty significant difference if the file ends up empty, vs if it has lines for the previously processed elements.

If initially implemented with one at a time evaluation, then users may come to depend on the file having the lines from all elements before the throwing was was evaluated. If initially pre-evaluating all elements then users could easily become reliant on that behavior.

And files are hardly the only such scenarios. This could also apply to GUI scenarios like adding elements to listbox, or whatever.

Obviously users understand they can get partial execution of the loop if the body throws. And users quickly learn about lazy Enumerables, like those created by generators and or LINQ, where they can get partial execution from the enumerator throwing. But users are used to any type of foreach-able construct either being eagerly evaluated or lazily evaluated.

But here you are trying to reserve the right for the compiler to sometimes treat the collection expression as eagerly evaluated (like arrays or spans or most collection classes), and other times to be lazily evaluated like a generator or LINQ Enumerables, and possibly even mix the two within one expression in undefined ways that may change in the future. Which is very much not something users are used to, and something which will be hard to explain to users.

It's not a breaking change as it's explicitly marked as unspecified. Taking a dependency on it puts one in unsupported territory, with changes free to happen.

Average users don't read the compiler specification closely enough to have any idea what is marked as unspecified or not. They see the new feature announcement which gives the basic overview (which probably won't have a giant warning about eagerness being unspecified) or they discover the feature in some codebase. Either way they play around with it a bit, and check (experimentally) to see the current behavior for things they might care about (like eager or lazy evaluation) and then assume it won't change in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A better example is if evaluation of any element throws

As mentioned previously, evaluation order of elements is strictly defined.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and something which will be hard to explain to users

For the tiny set of people that run into an issue here, the situation can be explained.

and then assume it won't change in the future.

Then that is the consequence of behaving that way. We have made a decision balancing the needs of the ecosystem as a whole. If developers run into an issue because of that, we are ok with it on balance.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could also apply to GUI scenarios like adding elements to listbox, or whatever.

It shouldn't. You can only get the final collection if no throwing occurred during the creation of the collectino. It is all assigned to a temp. So you either get it all, or you get nothing (since the actual code that cn reference that newly created collection doesn't run due to the exception).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

which probably won't have a giant warning about eagerness being unspecified

For nearly all users, there's no need to specify. Nearly all users with nearly all cases will simply get collections that work well including wrt to the efficiency of producing the result. Users that absolutely depend not only on element evaluation order, but enumeration order of those elements are going to be extremely slim. To the point that we literally haven't heard of a single user being affected by this in the entire beta and release period since release.

This indicates to me that this is not the sort of footgun being warned about. But is rather just a small part of the design that most people won't ever need to know about or be affected by. This is in line with many other features we have where there's some niche case that might result in slightly unexpected or undefined behavior if the user is doing something very far out of the norm. As such, designing and documenting it in the same vein seems totally ok :)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@CyrusNajmabadi I really don't think we are talking about the same thing, and that may explain some confusion.

As best I can tell, you seem to be talking about part of the already shipped collection expression feature for building collections where iteration order of spreads when building a collection may vary relative to the evaluation of elements. If that is what you are talking about then yes, I fully agree that it is really unlikely that changes there break people in practice.

But what I am talking about here is part of proposed specification for foreach loops over collection expressions. What I am looking at seems to say executions of the body of the foreach loop can interleave in unspecified ways with evaluating the elements of collection expression passed to the foreach loop:

The elements of the collection are evaluated in order, and the loop body is executed for each element in order. The compiler may evaluate subsequent elements before executing the loop body for preceding elements.

Which says that foreach (int i in [1, myfunc()]) { foo(i); } might do:

  • evaluate 1
  • run foo(1)
  • evaluate myfunc() (which happens to throw)

or it might do:

  • evaluate 1.
  • evaluate myfunc() (which happens to throw).
  • Would now try to run loop body for each, which won't happen because of the throw.

And furthermore which one it does might change in the future. That is what I have concerns about.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I understand your concerns. We'll def consider them during the ldm discussions on this topic.


## Alternatives

### Natural type

Add collection expression *natural type*.

The natural type would be a combination of the *best common type* for the element type (see [foreach](#foreach) above), and a choice of collection type, such as one of the following:

|Collection type|Mutable|Allocations|Async code|Returnable
|:---:|:---:|:---:|:---:|:---:|
|T[]|Items only|1|Yes|Yes|
|List&lt;T&gt;|Yes|2|Yes|Yes|
|Span&lt;T&gt;|Items only|0/1|No|No/Yes|
|ReadOnlySpan&lt;T&gt;|No|0/1|No|No/Yes|
|Memory&lt;T&gt;|Items only|1|Yes|Yes|
|ReadOnlyMemory&lt;T&gt;|No|1|Yes|Yes|

Natural type would allow use of collection expressions in cases where there is no target type:
```csharp
var a = [1, 2, 3]; // var
var b = [x, y].Where(e => e != null); // extension methods
var c = Identity([x, y]); // type inference

static T Identity<T>(T t) => t;
```

Natural type would likely only apply when there is a *best common type* for the elements.
```csharp
var d = []; // error
var e = [default]; // error: no type for default
var f = [1, null]; // error: no common type for int and <null>
```

Natural type would also allow a subset of the scenarios that are supported above. But since the above proposal relies on target typing and natural type relies on *best common type* of the elements within the collection expression, there will be some limitations if we don't also support target typing with nested collection expressions.
```csharp
byte[] x = [1, .. b ? [2] : []]; // error: cannot convert int to byte
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
int[] y = [1, .. b ? [default] : []]; // error: no type for [default]
string?[] y = [1, .. b ? [null] : []]; // error: no type for [null]

int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
int?[] z = [1, .. b ? [2, null] : []]; // error: no common type for int and <null>
Predicate<int>[] predicates = [a, .. b ? [i => i.ToString()] : []]; // No type for lambda.
Predicate<int>[] predicates = [a, .. b ? [(int i) => i.ToString()] : []]; // Func<int, string> not compatible with Predicate<int>

```
10 changes: 5 additions & 5 deletions proposals/csharp-12.0/collection-expressions.md
Original file line number Diff line number Diff line change
Expand Up @@ -118,9 +118,9 @@ An implicit *collection expression conversion* exists from a collection expressi
* `System.Collections.Generic.ICollection<T>`
* `System.Collections.Generic.IList<T>`

The implicit conversion exists if the type has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `U` where for each *element* `Eᵢ` in the collection expression:
* If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `U`.
* If `Eᵢ` is an *spread element* `Sᵢ`, there is an implicit conversion from the *iteration type* of `Sᵢ` to `U`.
The implicit conversion exists if the type has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` where for each *element* `Eᵢ` in the collection expression:
* If `Eᵢ` is an *expression element*, there is an implicit conversion from `Eᵢ` to `Tₑ`.
* If `Eᵢ` is a *spread element* `..Sᵢ`, there is an implicit conversion from the *iteration type* of `Sᵢ` to `Tₑ`.

There is no *collection expression conversion* from a collection expression to a multi dimensional *array type*.

Expand Down Expand Up @@ -379,7 +379,7 @@ The existing rules for the [*first phase*](https://github.com/dotnet/csharpstand
>
> * If `E` is a *collection expression* with elements `Eᵢ`, and `T` is a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` or `T` is a *nullable value type* `T0?` and `T0` has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ`, then for each `Eᵢ`:
> * If `Eᵢ` is an *expression element*, then an *input type inference* is made *from* `Eᵢ` *to* `Tₑ`.
> * If `Eᵢ` is an *spread element* with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Sᵢ`, then a [*lower-bound inference*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#116310-lower-bound-inferences) is made *from* `Sᵢ` *to* `Tₑ`.
> * If `Eᵢ` is a *spread element* `..Sᵢ`, then a [*lower-bound inference*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/expressions.md#116310-lower-bound-inferences) is made *from* the [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) of `Sᵢ` *to* `Tₑ`.
> * *[existing rules from first phase]* ...

> 11.6.3.7 Output type inferences
Expand All @@ -388,7 +388,7 @@ The existing rules for the [*first phase*](https://github.com/dotnet/csharpstand
>
> * If `E` is a *collection expression* with elements `Eᵢ`, and `T` is a type with an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ` or `T` is a *nullable value type* `T0?` and `T0` has an [*iteration type*](https://github.com/dotnet/csharpstandard/blob/standard-v6/standard/statements.md#1295-the-foreach-statement) `Tₑ`, then for each `Eᵢ`:
> * If `Eᵢ` is an *expression element*, then an *output type inference* is made *from* `Eᵢ` *to* `Tₑ`.
> * If `Eᵢ` is an *spread element*, no inference is made from `Eᵢ`.
> * If `Eᵢ` is a *spread element*, no inference is made from `Eᵢ`.
> * *[existing rules from output type inferences]* ...

## Extension methods
Expand Down