Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Collection Expressions Next (C#13 and beyond) #7913

Open
1 of 4 tasks
CyrusNajmabadi opened this issue Feb 5, 2024 · 82 comments
Open
1 of 4 tasks

[Proposal]: Collection Expressions Next (C#13 and beyond) #7913

CyrusNajmabadi opened this issue Feb 5, 2024 · 82 comments
Assignees

Comments

@CyrusNajmabadi
Copy link
Member

CyrusNajmabadi commented Feb 5, 2024

Collection Expressions Next

  • Proposed
  • Prototype: Not Started
  • Implementation: Not Started
  • Specification: Not Started

Summary

This issue is intended to be the umbrella tracking item for all collection expression designs and work following the core design (#5354) that shipped in C#12.

As this is likely to be a large item with many constituent parts, it will link out to respective discussions and designs as they occur.

Roughly, here are the items we would like to consider, as well as early notes on the topic: https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2024-01-23.md

  1. Dictionary expressions. [Proposal]: Dictionary expressions #7822
  2. Natural type
  3. Immediately enumerated collections. e.g. ["a", .. b ? ["c"] : []] and foreach (bool? b in [true, false, null]). Collection expressions: inline collections in spreads and foreach #7864
  4. Extension methods on collections
  5. Missing collection-creatable types (e.g. Memory<T>, ArraySegment<T> etc.)
  6. Supporting non-generic core collections (IEnumerable, etc.)
  7. Relaxing restrictions. Open issue: relax requirement that type be enumerable to participate in collection expressions #7744

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-01-08.md - Iteration types of collections
https://github.com/dotnet/csharplang/blob/main/meetings/working-groups/collection-literals/CL-2024-01-23.md - WG meetup
https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-01-10.md - Conversions vs construction
https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-02-05.md#collection-expressions-inline-collections
https://github.com/dotnet/csharplang/blob/main/meetings/2024/LDM-2024-09-06.md#collection-expressions-next-c13-and-beyond

@KennethHoff
Copy link

I'd like to see some way of calling the constructor of a type and/or setting some property during initialization. The first thing that came to mind is the comparer of a dictionary, but I'm sure there are other use-cases.

Either way, collection expressions are amazing, and support for natural types and inline expressions would make them that little bit better!

@CyrusNajmabadi
Copy link
Member Author

@KennethHoff that's in the list, as part of hte dictionary-expression exploration work. Thanks :)

@BlinD-HuNTeR
Copy link

I'm not sure if the following proposal is too crazy, so I will describe it here quickly, as it's related to this topic:

Imagine that I have a int[] ints = [1, 2, 3], and I would like to have a string[] with the representations of those ints.
One way to do that would be string[] strings = [ints[0].ToString(), ints[1].ToString(), ints[2].ToString()], but that only works because I know the length.

Ideally I would like to spread the array of ints into the array of strings while also calling ToString on each element.

If int had an user-defined implicit conversion operator to string, then I could write string[] strings = [.. ints], and it would be valid C# code, invoking the op_Implicit method on each int.

But since there is no such conversion, I would like to write something like string[] strings = [ (..ints).ToString()], and it would do exactly that. Hence, the proposal:

Spread operator as a first-class citizen.

So basically the spread operator .. becomes a regular unary-operator of the C# language, which at first may only be used inside collection expressions, but possibly could have its use expanded to everywhere.

The spread unary operator may be applied to any other expression, resulting in a spread_expression. When that happens, an implicit foreach loop is inserted for expression following the usual rules. It is, therefore, a compile-error if the operator is applied to something that isn't enumerable.

The spread_expression then behaves, for the purposes of member-lookup and overload-resolution, as an expression whose type is the element-type of the original enumerable. Because of that, one may invoke any members the element type might have, as well use the spread_expression on a method that takes element-type as argument. These invocations will then be inserted inside the invisible foreach loop, and spread_expression will be substituted for the loop variable.

The result of a member invocation performed on, or taking the spread_expression as an argument, is also itself a spread_expression, whose element-type is the return type of the invoked member, if not void. Method invocations can be chained on a spread_expression , and they will all be moved to the inside of the foreach loop.

Finally, one will want to capture the result of all these method invocations on each element of the original collection. Therefore, any spread_expression can be used regularly as a spread_element in a collection expression, with the existing rules.

Problems:

  • There is a known ambiguity between spread operator and range operator. Originally this was addressed by forcing range operator to be inside parentheses, but with this proposal, it will also be necessary to put the spread inside parentheses when invoking members of the element-type. This is not an impeditive per se, because range operator may only be applied to int or System.Index, which are not (usually) enumerable, but the complexity of parsing would increase.

@HaloFour
Copy link
Contributor

HaloFour commented Feb 6, 2024

@BlinD-HuNTeR

C# already has a query comprehension syntax, LINQ.

@BlinD-HuNTeR
Copy link

C# already has a query comprehension syntax, LINQ.

True, but, that's not really an argument. If you would argue against all proposals saying "it can already be done in some way", then none would ever be accepted. Lists could already be created with LINQ or with initializers, yet collection expressions were introduced. And they have the added benefits of duck typing/nice syntax/good performance.

LINQ, on the other hand, is interface-based and makes use of delegates and anonymous objects. If one could perform some simple transformations through the use of spread operator, everything would be inserted directly in the caller method, with no delegates or closures. There isn't any optimization better than that, LINQ would almost be obsolete.

@HaloFour
Copy link
Contributor

HaloFour commented Feb 6, 2024

@BlinD-HuNTeR

See: #7634

@CyrusNajmabadi
Copy link
Member Author

@BlinD-HuNTeR

Imagine that I have a int[] ints = [1, 2, 3], and I would like to have a string[] with the representations of those ints.

That's where supporting extension methods would help. As you could write:

[1, 2, 3].Select(i => i.ToString()).ToArray()

@KennethHoff
Copy link

KennethHoff commented Feb 6, 2024

[1, 2, 3].Select(i => i.ToString()).ToArray()

I assume you could also do this?

[ ..[1, 2, 3].Select(i => i.ToString()) ]

@CyrusNajmabadi
Copy link
Member Author

@KennethHoff yes.

@jnm2
Copy link
Contributor

jnm2 commented Feb 6, 2024

@KennethHoff Yes, or [.. from i in [1, 2, 3] select i.ToString()]

@En3Tho
Copy link

En3Tho commented Feb 7, 2024

I suggest widening support for collection builders to accept other buildable collections.

For example I can choose to wrap a HashSet or a List or an Array or whatever. It feels so weird to accept a Span for such types.

@CyrusNajmabadi
Copy link
Member Author

@En3Tho Can you give an example?

@En3Tho
Copy link

En3Tho commented Feb 7, 2024

@CyrusNajmabadi

public class ArrayWrapperBuilder
{
    // This works
    public static ArrayWrapper<T> Create<T>(ReadOnlySpan<T> values)
    {
        return new(values.ToArray());
    }

    // I want this to work instead. Array is already a creatable collection, so just let compiler create it and use directly here
    // Imagine if it was a List<T> or HashSet<T> wrapper. Even more allocations while compiler is perfectly able to create this kind of collection directly.
    public static ArrayWrapper<T> Create<T>(T[] values)
    {
        return new(values);
    }
}

[CollectionBuilder(typeof(ArrayWrapperBuilder), nameof(ArrayWrapperBuilder.Create))]
public readonly struct ArrayWrapper<T>(T[] array) : IEnumerable<T>
{
    public T[] Array => array;
    public IEnumerator<T> GetEnumerator()
    {
        throw new NotImplementedException();
    }

    IEnumerator IEnumerable.GetEnumerator()
    {
        return GetEnumerator();
    }
}

public static class ArrayWrapperCreator
{
    public static ArrayWrapper<T> GetWrapper<T>() => [default, default, default];
}

@koszeggy
Copy link

koszeggy commented Feb 7, 2024

I'm glad the lack of foreach support is covered in the plans.

Something like

foreach (int i in [1, 2, 3])

was literally the first thing I tried with collection literals and I was surprised why it was failing to compile with CS9176 saying there was no target type. It basically provides exactly the same amount of information as

foreach (var i in (IEnumerable<int>)[1, 2, 3])

which happily compiles today (though I would rather use some old-school array initialization instead of the bulky cast).

I think this could be added even independently from natural types.

@eiriktsarpalis
Copy link
Member

One addition that we would like to see is support for multi-dimensional collection literals. These are important when working with tensor libraries, e.g. here's how simple tensors are defined using the pytorch library:

data = [[1, 2],[3, 4]]
x_data = torch.tensor(data)

The above form isn't feasible using today's C# collection literals, so supporting it without some kind of language support seems unlikely.

One possibility is that we could reuse nested collection literal syntax to construct multi-dimensional collections. TL;DR it should be possible to extend the collection builder pattern to recognize factory methods such as

public static T[,] Create<T>(ReadOnlySpan<T> values, int nDim, int mDim);

and then be able to define 2-D arrays like so

T[,] values = [[1, 0], [0, 1]];

In principle, it should be possible for the compiler to infer the rank and dimensions and detect shape mismatches (e.g. something like [[1], [2, 3]]).

What's more interesting though is that by reusing nested collection syntax there is no inherent limit on the supported number of dimensions, and the number of dimensions doesn't need to be fixed for a given type. We could for instance support builder methods of shape

public static Tensor<T> Create<T>(ReadOnlySpan<T> values, params ReadOnlySpan<int> dimensions);

Which should let you specify the following 2x2x2 tensor:

Tensor<int> tensor = [[[0, 0], [0, 0]], [[1, 0], [0, 1]]];

@CyrusNajmabadi
Copy link
Member Author

Thanks @eiriktsarpalis . The working group is discussing this. I'll add you to that.

@NetMage
Copy link

NetMage commented Feb 14, 2024

Having read the (hundreds) of comments on the original Collection Expression proposal, and noting the large number of natural type immutable versus mutable comments, what about if the type was immutable if the Collection Expression consisted only of immutable literals?

var ss = [ "a", "b", "c" ]; // creates ImmutableArray<string> (or ideally a native immutable array type (`const T)[]`)

ss[1] = "z"; // compiler error

but

var tt = [ ..ss ];

tt[1] = "z"; // fine

@jnm2
Copy link
Contributor

jnm2 commented Feb 14, 2024

@NetMage Why would [.. expr] create a mutable collection while [expr] creates an immutable one?

@SENya1990
Copy link

Greetings to everyone.

Sorry in advance if the topic I am asking about was already discussed somewhere. In that case I would be very grateful if you share the link to such discussion with me. I usually do not participate in the proposal discussions, this is a first one for me. So, please excuse me if I don't follow some workflow for proposal discussions.

I decided to reach out after I worked a bit with collection expressions in C# 12, investigated the generated code for them, and found two cases that I personally found confusing. Here is the first one:

Infinite generators.

Suppose that you have some kind of a collection generator that can generate an infinite IEnumerable<T> collection.
Here is an example:

private static IEnumerable<int> OddNumbers()
{
	int current = 1;

	while (true)
	{
		yield return current;
		current+=2;
	}
}

This can be used with LINQ: OddNumbers().Take(3).
Now I have to prepend the collection with -1:

var oddNumbers = OddNumbers();
IEnumerable<int> oddNumberWithMinus1 = [-1, .. oddNumbers];
// we will never get here

Unexpectedly, this program hangs. The investigation of the generated code shows that the compiler generates the code that internally fully materializes the collection in memory:

IEnumerable<int> ints1 = Program.OddNumbers();
List<int> items = new List<int>();
items.Add(-1);

foreach (int num in ints1)
     items.Add(num);
     
IEnumerable<int> ints2 = (IEnumerable<int>) new \u003C\u003Ez__ReadOnlyList<int>(items);

Because of this the program hangs - it's impossible to materialize an infinite collection. So my problems are:

  1. It is unexpected to see that infinite collections are not supported. Maybe it's my fault but I totally missed that the collection initializer feature works only on finite collections and requires their materialization. I saw terms "collection" and "sequence" used in the discussion and in the articles describing the feature (like this one from MSDN: https://learn.microsoft.com/en-us/dotnet/csharp/language-reference/operators/collection-expressions).

However, the term collection was used for IEnumerable<T> and nothing mentions that they have to be finite. If I missed it then I'm sorry about this rant, please share the link with me.

I think, it will be difficult if even possible to write an analyzer to warn about such cases, but at least the documentation should be very clear about them.

  1. It is unexpected to see an intermediate collection materialization of IEnumerable<T> completely hidden from the developer.
    Moreover, the usage of collection expression breaks the lazy evaluation implicitly. Is it expected that square braces act as some sort of collection materializer like ToList() and ToArray() in LINQ?

Could this scenario receive some extra support in C# 13? Because the extra allocation here could be removed with standard LINQ methods like Prepend:

var oddNumbers = OddNumbers();
IEnumerable<int> x = oddNumbers.Prepend(-1);
Console.WriteLine(string.Join(" ", x.Take(3)));             // prints -1 1 3

Sorry if such proposal was already considered and thanks in advance.

@HaloFour
Copy link
Contributor

@SENya1990

Collection expressions aren't a comprehension language or intended as an alternative to LINQ, they always fully materialize spreads. I do agree that the documentation should probably call this out more clearly.

@CyrusNajmabadi
Copy link
Member Author

Is it expected that square braces act as some sort of collection materializer like ToList() and ToArray() in LINQ?

Yes. It's a core part of the design. Linq is already there for comprehensions. Collection expressions exist intentionally to produce fully materialized collections.

@CyrusNajmabadi
Copy link
Member Author

Could this scenario receive some extra support in C# 13? Because the extra allocation here could be removed with standard LINQ methods like Prepend:

Thanks for the feedback, but this was an intentional decision. We do not want these collections to be lazy (and then have expensive enumeration semantics, or have them redo computation each time you enumerate). We have query comprehensions for that already. These collections are intended to be fully materialized, so you know that the final collection you get is cheap, efficient and finite.

@SENya1990
Copy link

The second scenario I have encountered is somewhat artificial and definitely does not follow best design practices.
Still, I believe it useful to bring it because in this scenario C# compiler breaks the code subtly because of its usage of duck typing.

Misuse of the "Count" property with duck typing

The scenario involves the collection expression like this:

public static IEnumerable<int> PrependOne(<Some integer collection type> s) => [1, ..s];

The generated code for the method looks like this:

      int num1 = 1;
      <collection type> myCollection = s;
      int index1 = 0;
      int[] items = new int[1 + myCollection.Count];
      items[index1] = num1;
      int index2 = index1 + 1;
      
      foreach (int num2 in myCollection)
      {
        items[index2] = num2;
        ++index2;
      }
      
      return (IEnumerable<int>) new <>z__ReadOnlyArray<int>(items);

The C# compiler is very clever, it attempts to generate the most optimized code relying on the Count property if such property is present in the collection type and deducing the required size of the created temporary array used to materialize the collection.

However, while C# compiler cheats while using this size calculation. Everything is OK when the collection explicitly states that it can provide Count for its elements by implementing one of the common interfaces for the collections with a known number of elements such as IReadOnlyCollection<T>. But, when collection does not implement such interface, C# compiler uses duck typing to access the Count property, if the collection has such property accessible in the scope of the collection expression.

Now consider this example. Suppose I have the following custom collection:

public class MyCollection : IEnumerable<int>, IEnumerable<string>
{
	private readonly int[] _ints;
	private readonly string[] _strings;

	internal int Count => _strings.Length;

	public MyCollection(int[] ints, string[] strings) => (_ints, _strings) = (ints, strings);

	public IEnumerator<int> GetEnumerator() => ((IEnumerable<int>)_ints).GetEnumerator();

	IEnumerator IEnumerable.GetEnumerator() => _ints.GetEnumerator();

	IEnumerator<string> IEnumerable<string>.GetEnumerator() => ((IEnumerable<string>)_strings).GetEnumerator();
}

The example is artificial and it's clearly not the best design. But similar code appears sometimes, or can live in an old legacy code base.

Now, suppose I'm using this custom collection in a collection expression:

public static IEnumerable<int> PrependOne(MyCollection s) => [1, ..s];

//...
var myCollection = new MyCollection([1, 2, 3, 4], ["string"]);
IEnumerable<int> modified = PrependOne(myCollection);

And unexpectedly I received IndexOutOfRangeException. Why? I never have accessed any indexes. This custom collection does not even have them! This error happens because C# compiler assumes that there is a contract somewhere where it does not exists.

This looks confusing to me, no written code has any access by index. It requires for developer to know what code is generated by the compiler behind the nice syntax sugar.

I can't say that I really like this compiler trick even if it is legal and probably brings some performance benefits and less allocations in case of value types. The compiler assumes a contract in a place where it does not exists (the whole idea of duck typing). I know that C# already did this before, for example with foreach. But in this place in my opinion it is easier to misinterpret the intention behind the Count property. And C# feels much less statically typed at this moment (and I'm not talking about the dynamic feature).

@SENya1990
Copy link

SENya1990 commented Feb 17, 2024

@HaloFour , @CyrusNajmabadi thank you for your response!

You both very clearly confirmed that the feature was designed to materialize collections and works only on finite collections.
Could you please specify a place in the Microsoft documentation, or at least an easy to find blog post (although not ideal because they are frequently lost after several years), where it contains as clear explanation as the one you provided to me?

@CyrusNajmabadi
Copy link
Member Author

The example is artificial and it's clearly not the best design.

We made explicit choices with collection expressions to assume that people write well-behaved and sensible types. We want the optimizations to hit the broadest set of cases. It is understood that a non-well-behaved type may then have problems. But we're optimizing the lang design for the literal 99.999% case, at the cost of these strange outliers. Our recommendation if you do have types like these is to write analyzers to block them off with collection-exprs.

@SENya1990
Copy link

SENya1990 commented Feb 17, 2024

@CyrusNajmabadi that is understandable. I do understand at least part of the reasons why the duck typing was used.
However, shouldn't this be described very thoroughly in the official documentation?

For example, you just introduced a new way to classify types - well-behaved. Is there a formal definition for a well behaved type? I may consider the collection from my example to be well-behaved, why not? It does not break any contracts it explicitly states, only some implicit contracts that were later imposed by the new version of compiler.

@CyrusNajmabadi
Copy link
Member Author

However, shouldn't this be described very thoroughly in the official documentation?

Sure. Feel free to file doc bugs. :)

For example, you just introduced a new way to classify types - well-behaved. Is there a formal definition for a well behaved type?

Yes. If you are a collection type (defined in our spec), and you supply a .Count, then enumerating you should produce the same number of elements. Similarly, if you have an indexer, and you index into the type from 0 to Count-1 then those should produce the same values in the same order as if you just did the enumerator.

@CyrusNajmabadi
Copy link
Member Author

I may consider the collection from my example to be well-behaved, why not?

Because the Count and GetEnumerator refer to totally different sequences. Again, this is vastly out of the normal case for collection use in reality. :)

We are being pragmatic here. The 99.999% case is well behaved collections. Sacrificing the value we get on the normal case for the ecosystem for strange types like this would be cutting off our nose to spite our face.

@SENya1990
Copy link

Yes. It's a core part of the design. Linq is already there for comprehensions. Collection expressions exist intentionally to produce fully materialized collections.

And there is nothing in the documentation that clarifies this and prevents the feature from being misused. Moreover, collection expressions definitely overlap with IEnumerable collections. Before I learned about the collection materialization I would definitely use collection expressions instead of LINQ Append, Prepend, Concat because of the new short and more elegant syntax.

Sure. Feel free to file doc bugs. :)

I definitely will. I have read more thoroughly the documentation and feature specs and I'm sorry to say this but I feel that the current documentation is in an awful state. A lot of things like collection materialization or duck typing are implicit or not mentioned at all. Many are described in "feature specs" which contain too much details about implementation and at the same time explicitly state that the actual implementation may be different which undermines their value.

@CyrusNajmabadi
Copy link
Member Author

And there is nothing in the documentation that clarifies this and prevents the feature from being misused.

Feel free to contribute doc fixes or file issues. This is all open source :-)

This is the repo for the design and specification of the feature. Docs are not done here. Feedback on that should go to the docs team. Thanks!

@jnm2
Copy link
Contributor

jnm2 commented Jul 17, 2024

You can always be consistent and put the type on the RHS, if that's the goal.

@Mrxx99
Copy link
Contributor

Mrxx99 commented Jul 20, 2024

If we won't get natural type in C# 13 (I hope we will), can at least this be supported in C# 13?:

foreach (var number in [1, 2, 3])
{
    Console.WriteLine(number);
}

Here the compile could decide if it wants to use stackalloc and Span or Array

@koszeggy
Copy link

If we won't get natural type in C# 13 (I hope we will), can at least this be supported in C# 13?:

foreach (var number in [1, 2, 3])
{
    Console.WriteLine(number);
}

I would be happy even if I had to specify int instead of var, which isn't supported now either.

@scalablecory
Copy link

I wonder if natural type enables slightly weird EF IN translation...

var results =
   from x in items
   where ["cat", "dog"].Contains(x.name)
   select x;

@BlinD-HuNTeR
Copy link

I wonder if natural type enables slightly weird EF IN translation...

var results =
   from x in items
   where ["cat", "dog"].Contains(x.name)
   select x;

If they decide that natural type is List<>, then that would definitely work. But for your example, I suppose it would be better to write where x.name is "cat" or "dog"

@colejohnson66
Copy link

Why is it intuitive to be an array? Why not List<int>, like Python and JavaScript? If you want a dedicated type, just write the type instead of var.

@SinxHe
Copy link

SinxHe commented Aug 21, 2024

As expected, not only me confused why there is not a natural type.

@zhyy2008z
Copy link

Why is it intuitive to be an array? Why not List<int>, like Python and JavaScript? If you want a dedicated type, just write the type instead of var.

  1. Since arrays are first class citizens of the C# language, with specialised syntax rules (T[]), and are the basis for the vast majority of other collections, the collection literal should of course use this most basic type when the type is unqualified, the simpler the better;
  2. Arrays are better performing and lighter than List<T>, and it is also consistent with general first instinct to infer a more resource-efficient type rather than a more complex one when the type is unknown.
  3. To support List<T>, it's recommended to use this syntax as @OJacot-Descombes says ‘var b = [1, 2, 3, ..] ; // List<int>’.

@CyrusNajmabadi
Copy link
Member Author

What would the corresponding natural type be for a dictionary literal?

@theunrepentantgeek
Copy link

At least, var a=[1,2,3]; should be int[] type, because, it‘s more intuitive!

So here's the thing. That's your intuition, and it's not universal.

I think that List<T> is the most intuitive type to use.

Why? List<T> is the common workhorse of the vast majority of C# code, it has a much richer and more powerful API, and using it encourages people to write code that's easier to read and more maintainable.

The performance difference between List<T> and Array is literally on the order of a few nanoseconds per access - it's so small that it defies measurement, being much much smaller than the statistical noise from most benchmarking tools. Performance is not a good reason to prefer array.

Performance is an important metric - but it's far from the only one, and in many situations not the most important by far.

Aside: I've achieved many significant performance increases by rewriting array laden code to use List<T> instead of array - the benefits of a more powerful API allowing smarter algorithmic choices, resulting in runtime benefits that far outweighed any impact of member access. Of course, YMMV.

At the risk of misquoting the aphorism:

Make it work
Make it beautiful
Make it fast (if you need to)
In precisely that order.

My point here is not that the natural type should be List<T> - though I personally think that would be the better choice - rather it's that there is no one easy answer to this.

Your intuition says array.
Mine says List<T>.
I'm sure there are those who would advocate for an immutable type as that allows for greater optimization and is naturally thread-safe.

@zhyy2008z
Copy link

At least, var a=[1,2,3]; should be int[] type, because, it‘s more intuitive!

So here's the thing. That's your intuition, and it's not universal.

I think that List<T> is the most intuitive type to use.

Why? List<T> is the common workhorse of the vast majority of C# code, it has a much richer and more powerful API, and using it encourages people to write code that's easier to read and more maintainable.

The performance difference between List<T> and Array is literally on the order of a few nanoseconds per access - it's so small that it defies measurement, being much much smaller than the statistical noise from most benchmarking tools. Performance is not a good reason to prefer array.

Performance is an important metric - but it's far from the only one, and in many situations not the most important by far.

Aside: I've achieved many significant performance increases by rewriting array laden code to use List<T> instead of array - the benefits of a more powerful API allowing smarter algorithmic choices, resulting in runtime benefits that far outweighed any impact of member access. Of course, YMMV.

At the risk of misquoting the aphorism:

Make it work
Make it beautiful
Make it fast (if you need to)
In precisely that order.

My point here is not that the natural type should be List<T> - though I personally think that would be the better choice - rather it's that there is no one easy answer to this.

Your intuition says array. Mine says List<T>. I'm sure there are those who would advocate for an immutable type as that allows for greater optimization and is naturally thread-safe.

Isn't that better? more readable?

@theunrepentantgeek
Copy link

var b = [1, 2, 3, ..]; // List<int>

This syntax simply involves lying to the reader.
It looks like it has 4 elements, but it actually has only 3.

Good code doesn't require the reader to overcome deception to understand it.

@zhyy2008z
Copy link

nts, but it actually has only 3.

Good code doesn't require the reader to overcome deception to understand it.

Most people would think .. means indeterminate number, I think.

@yaakov-h
Copy link
Member

I assume that is only an illustrative example and not shorthand for var b = [1, 2, 3, Range.All];?

@julealgon
Copy link

@zhyy2008z

Most people would think .. means indeterminate number, I think.

I don't disagree but I also see another interpretation that could be bad, which is to think that the array is just an infinite sequence and we are using a shorthand there to mean

var b = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10 .....;

Yes, I know the symbol is not an ellipsis but still... just sharing this potential interpretation that just came to my mind when looking at it. You could even make a relation to how Excel generates values based on a previous sequence.

@SENya1990
Copy link

@julealgon earlier in this thread it was already established that infinite sequences are not supported by collection expressions. They always materialize collections and this is by design:
#7913 (comment)

@julealgon
Copy link

@SENya1990

@julealgon earlier in this thread it was already established that infinite sequences are not supported by collection expressions. They always materialize collections and this is by design: #7913 (comment)

I never implied infinite sequence expressions were supported. I was just making a point that, from the outside, one could look into that syntax and think it was an infinite sequence. I was making the argument for readability only, in the sense that it could be misleading potentially.

@smkanadl
Copy link

In C++ there is the std::initializer_list which is the resulting type of an expression like this: auto test = { 1, 2, 3 };

I can imagine that due to the static type system a similar approach could also work for C#.

@KennethHoff
Copy link

KennethHoff commented Aug 21, 2024

This might not fit into how C# natural types normally work, but here's a wild suggestion:

How about making the natural type of a collection expression into just "a collection expression of the element type" in the sense that it is the type that it's used in. Here's an example:

var coll = [1,2,3]; // Currently typed as simply "collection expression of ints" - basically just a recipe.

PrintNumbers(coll); // From this point on, because it was used as a `int[]` it's now an `int[]`.

void PrintNumbers(int[] numbers) { ... }

The previous example is identical to the following, except you also get to keep the reference in the coll local.

PrintNumbers([1,2,3]);

void PrintNumbers(int[] number) { ... }
var coll = [1,2,3]; // Also just an "collection expression of ints"

foreach (var num in coll) // First used in a foreach loop, so we'll use the most efficient type here, which is `ReadOnlySpan<int>`.
{
	...
}
var coll = [1,2,3]; // Never used, so this never materializes.

This natural type would be decided at compile time, so you can always hover over it in the IDE to see what it chose. I'm not sure what it means for the last example though; An unused local.

@julealgon
Copy link

@KennethHoff what if you never pass it to another method where there is a target type? What happens when you declare the var and then try to access its members/methods? What will show there, then?

I don't think your proposal would work well because you don't always have an obvious target-type to base the decision when coding.

@jnm2
Copy link
Contributor

jnm2 commented Aug 21, 2024

@KennethHoff That's an approach we've considered, and it still might come in handy. However, it brings up questions of teleporting the materialization to a specific type when there are multiple usages, which can be hard to reason about.

@KennethHoff
Copy link

KennethHoff commented Aug 21, 2024

@KennethHoff what if you never pass it to another method where there is a target type?

I don't understand this case. Could you give an example? Assuming you mean "what if you never pass it to another method" then it's the last example in my original comment; It's a noop similar to how primary constructors work for non-records.

What happens when you declare the var and then try to access its members/methods? What will show there, then?

If you don't use it anywhere it doesn't have a type and therefore doesn't have any members.

If you do use it elsewhere then it has the type of whatever "elsewhere" is, and you could use those members.

var coll1 = [1,2,3];
var coll2 = [2,4,6];
var coll3 = [3,6,9];

coll1.ToString(); // Compile error. Cannot call ToString on an object of type "collection expression of ints"
coll2.ToString(); // Calls ToString on List<int>.
coll3.ToString(); // Calls ToString on IEnumerable<int>, which forwards to the synthesized type for IEnumerable<int> that currently exists. Partially UB.

PrintNumbers1(coll2); // As this is the first time coll2 is referenced after its declaration, coll2 is now retroactively typed as List<int>.
PrintNumbers2(coll3); // As this is the first time coll3 is referenced after its declaration, coll3 is now retroactively typed as IEnumerable<int>. 

PrintNumbers1(coll3); // Compile error. Cannot implicitly convert IEnumerable<int> to List<int>

void PrintNumbers1(List<int> numbers) { ... }
void PrintNumbers2(IEnumerable<int> numbers) { ... }

This does suffer from the ol' "spooky action at a distance" problem where changing something can break something seemingly unrelated. Say you changed the PrintNumbers1 method to instead take IEnumerable<int> as well, then the call to coll2.ToString() would change to call a different ToString() impl.

This interpretation also naturally adds some optimizations that you maybe otherwise couldn't've done, like simply never materializing it if it's not needed:

var coll = [1,2,3];

foreach (var item in coll)
{
	Console.WriteLine(item);
}

Because this was only used inside a foreach loop, we can simply compile this as this:

Console.WriteLine(1);
Console.WriteLine(2);
Console.WriteLine(3);

@CyrusNajmabadi
Copy link
Member Author

@KennethHoff

var v = [1, 2, 3];
v.Add(4);

@KennethHoff
Copy link

KennethHoff commented Aug 21, 2024

var v = [1, 2, 3];
v.Add(4);

Compile error. Cannot call Add on "Collection expression of ints".

This however would work:

List<int> DoThing()
{
	var v = [1,2,3];
	v.Add(4);

	return v; // By virtue of being the return value - which is clearly defined - `v` is retroactively typed as List<int>
}

I'm not saying this is particularly clear semantics - especially for human readers - but I do think it's at least consistent; First unambiguous reference defines the type, retroactively.

That would also mean that changing the previous example to any of the following would indeed make it no longer compile:

List<int> DoThing()
{
	var v = [1,2,3];
	v.Add(4);

	DoThingWithArray(v);

	return v; // int[] is not implicitly convertible to List<int>
}

void DoThingWithArray(int[] numbers) { ... }
IEnumerable<int> DoThing()
{
	var v = [1,2,3];
	v.Add(4); // There is no `Add` on `IEnumerable<int>`

	return v;
}

@julealgon
Copy link

@KennethHoff

Compile error. Cannot call Add on "Collection expression of ints".

This would lead to terrible dev experience.

I would be ok if the compiler used context to decide the type, but there should be a type right away for when context is not known upfront.

If someone is just typing code, like this:

var stuff = [1, 2, 3];
stuff.

They should get intellisense for something. stuff needs to have a type even when a target type is not provided via usage.

Sometimes, context will just be completely insufficient:

var stuff = [1, 2, 3];
var something = stuff.ToString();

What is the type of stuff now?

You are making assumptions that there will always be some target collection type that can be inferred from usage, which is really not true at all.

@nuiva
Copy link

nuiva commented Aug 21, 2024

@KennethHoff: Would this work in your model?

void Imply(this List<int> a, List<int> b) {} // Public extension method visible to DoThing
List<int> DoThing()
{
	var a = [1];
	var b = [2];
	b.Imply(a); // Method not known yet
	var c = [3];
	c.Imply(b); // Method not known yet
	return c; // Implies c: List<int>
		// -> c.Imply(b) implies b: List<int>
		// -> b.Imply(a) implies a: List<int>
}

As much as I like type inference, this just feels like a half-assed solution that feels bad for both the Hindley-Milner crowd and the Grug brained devs.

@KennethHoff
Copy link

Just want to say; I do not think my suggestion is good. It has terrible DX.

When it comes to @nuiva's question:

void Imply(this List<int> a, List<int> b) {} // Public extension method visible to DoThing
List<int> DoThing()
{
	var a = [1]; // a is unknown.
	var b = [2]; // b is unknown.
	b.Imply(a); // b is unknown. a will be List<int> if b turns out to be List<int>.
	var c = [3]; // c is unknown.
	c.Imply(b); // c is unknown. b will be List<int> if c turns out to be List<int>.
	return c; // c is List<int>, so b is List<int>, so a has to be List<int>.
}

So yes, you understood my (way too implicit/magical) thought experiment correctly :s

@jnm2
Copy link
Contributor

jnm2 commented Aug 23, 2024

var stuff = [1, 2, 3];
var something = stuff.ToString();

What is the type of stuff now?

You're (presumably) calling System.Object.ToString(), which implicitly involves a conversion to System.Object. Because of that, it's really the same question as asking what this does:

object stuff = [1, 2, 3];
// stuff.GetType(), stuff.ToString(), etc

If the type is determined later and the only thing you're doing later is .ToString(), we might say: sorry, the untyped collection expression can't be converted to object.

I'm not terribly bothered by var x = []; x.Add(...) not working. I can't do var x = null; either and assign it later. I can't do var num = cond ? 2 : 3; num += 0.1m; and have it retroactively realize I wanted the type to be decimal and not int. I can't do var animal = new Cat(); animal = new Dog(); and have it realize I want var to pick the Animal type.

My hope is for reasonable defaults, and then in cases where you want something else, you say what you want just like with every other var-declared local.

@julealgon
Copy link

@jnm2 My main concern is with the bad experience of zero intelissense until the IDE/compiler can figure out what the type will be. It will behave the same as if the type was dynamic.

Not a big fan.

The only way I would support something like this was if it was used to improve the selected type. This would mean it would start with a simple native type, say [], and be "promoted" to other types later based on usage, like List<T>, Span<T>, etc. That way at least you'd get a decent intellisense experience while coding right after defining the variable.

@jnm2
Copy link
Contributor

jnm2 commented Aug 23, 2024

@julealgon Another thing we explored was defining basic members for the "collection expression <T>" type. So intellisense would show Add, indexing, count, etc. It kept bringing us back to List<T> though. The language would have to encode the members of this new type, and then there would be requests to add stuff that List<T> has over time, and List<T> extension methods would not be available, etc. It seemed not likely worth the complexity.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests