Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Top level statements and member declarations (embrace scripting dialect) (VS 16.8, .NET 5) #2765

Open
MadsTorgersen opened this issue Aug 30, 2019 · 42 comments
Assignees
Labels
Design Review Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion
Milestone

Comments

@MadsTorgersen
Copy link
Contributor

The C# compiler currently understands a dialect of the language used for various scripting and interactive purposes. In this dialect, statements can be written at the top level (without enclosing member bodies) and non-virtual members (methods etc) can be written at the top level (without enclosing type declarations).

Usage of the scripting dialect has been relatively minor, and in some ways it hasn't kept up with the "proper" C# language. However, usage is picking up through Try.NET and other technologies, and I am concerned with being stuck with two incompatible dialects of C#.

I think there is merit to allowing top-level program code instead of requiring the "program" to be in a Main method. And I think there is merit to allowing functions to be declared at the top level of a program, without needing to be enclosed in a class. I think these allow small programs to be meaningfully simpler, and make the language easier to learn.

public int Fac(int x) => x < 0 ? x * Fac(x-1) : 1; // Top-level function
Console.WriteLine(Fac(5)); // Top-level statement

As opposed to:

public class Functions
{
    public int Fac(int x) => x < 0 ? x * Fac(x-1) : 1; // Top-level function
}
public class Program
{
    static void Main()
    {
        Console.WriteLine(Functions.Fac(5)); // Top-level statement
    }
}

I believe we should consider putting these extensions into C# "proper" and do away with the separate scripting dialect. In doing so we should not feel too bound by design details of the scripting dialect, but make sure we resolve those details in a way that's best for C# as a whole. I'm not so worried about "breaking" the C# scripting dialect. Since its use is mostly interactive, there probably isn't going to be all that much source code around that depends deeply on its semantics.

I hope that adding these features can help create a more continuous growth path from someone experimenting in a Jupyter workbook to having a full-blown C# application, and that sort of scenario.

I don't have a concrete fleshed-out proposal at this point. I think we should start by reviewing the current state of affairs and brainstorming what would work well and less well from the scripting dialect.

@HaloFour
Copy link
Contributor

If the top-level members are only accessible within the current assembly (my preference) I'd be in favor of all top-level members being implicitly static members of a partial internal static class with an unspeakable name. Other source files would automatically using static to import those top-level members. If the top-level members are defined within a namespace, then there would be additional static classes defined within those namespaces, and those static members would only be auto-imported if the namespace is imported.

If you want these top-level members to be accessible between assemblies, you can take a cue from Kotlin which offers this same feature. It's similar to above but the name of the static class is predictable and based on the package/namespace. You can apply attributes/annotations to cause the compiler to emit a different type name if you want, which is useful for Java interop scenarios in Kotlin and I think would be similarly useful here with the .NET ecosystem.

But to be honest, is it worth doing at all? Is there actually that much (or any) usage of C# script, enough to warrant that C# adopt it's syntax and idioms? To be honest I'd probably like to use it, but as far as I can tell it's not available on Mac. I've never seen it used in any Windows shop. To me it only functions as a slightly nicer Immediate Window. It could be more than that.

@yaakov-h
Copy link
Member

At the risk of stating the obvious, how would this work if multiple source files in a single compilation declare top-level statements?

@orthoxerox
Copy link

@yaakov-h The simplest option would be to raise an error when there are multiple source files in a single compilation and at least one has top-level statements.

@canton7
Copy link

canton7 commented Aug 31, 2019

(What follows is a brainstorm around the topic. I'm not invested in any of it - please feel free to shoot it down).

I'm a little bit worried about allowing people to interleave top-level code with things like class definitions (although I know plenty of scripting languages allow this). I can imagine finding unexpected surprises hidden in larger applications.

One thought is introducing something like a script block:

script
{
    public int Fac(int x) => x < 0 ? x * Fac(x-1) : 1; // Top-level function
    Console.WriteLine(Fac(5)); // Top-level statement
}

(You could either allow top-level functions only inside a script block, or both inside and out).

This makes it a bit more obvious that this code should be interpreted as a script. If there are features which are only available to top-level statements (I'm think of things like #r), it might make it a bit easier to explain that "these particular scripting features are only available in a scripting context, as denoted by a script block", rather than trying to explain why top-level code has access to some features that code in a "normal" method does not.

It also might make it a bit more intuitive that a script can't be split across multiple files (if that is indeed something that's forbidden). It might also make the error messages a bit nicer, since it gives an explicit name to a block of top-level code: "application already contains a 'script' definition" might be a bit clearer than "top-level code found in more than one file".

One could also imagine being able to name scripts (script Foobar { }), and then execute them (FooBar.Execute()) or access functions inside them (FooBar.Fac). They would conceptually map to a class which contains methods, and an Execute method containing code. You can take this further and allowing remapping things like Console for a script, and you're starting to reach the functionality provided by things like ScriptCS.

Once your script's got a name, that makes importing scripts across assemblies more intuitive. No attributes required.

A natural step is then allowing files with a particular extension (e.g. .csx) to be in a implicitly placed in a script block by the compiler (perhaps with the same name as the filename), or providing APIs for importing text as a script at runtime, iterating over all scripts in an application, etc...

Or maybe this is just piling on a lot of unnecessary complexity...


Scripting languages get away with having top-level code in multiple files, because there's a clear "entry" file and a tree of files included recursively from that entry file in a deterministic and defined order. A file's top-level statements are run when that file is first included. C# doesn't have that luxury, so "which bit of top-level code is executed when" is always going to be a bit harder to explain.

If we want C# to behave like a scripting language (in certain contexts), maybe we need to mimic this. That would mean giving script files a separate file extension (e.g. .csx), and allowing them to include each other (e.g. #load). One such script file could then be designated as the "entry" one (e.g following some naming convention, and/or by allowing "normal" C# code to execute a particular script file). Top-level statements would then be executed across different script files in the order that they are #loaded.

The downside is that there are now two distinct paradigms for people to learn, with the inevitable transition between them. There's also the challenge of explaining "this snippet of code only works if you put it in a file ending with .csx).

@YairHalberstadt
Copy link
Contributor

I think what's more important for c# scripting than the syntax is the tooling. There are countless times when I would like to spin up a repl to test my C# code, but unfortunately I find the tooling for c# interactive to be lacking, and I'm forced to experiment via unit tests instead.

@pie-flavor
Copy link

@canton7 For all that C# lets you do, it is still a thoroughly unambiguous language; things like duplicate names are checked aggressively by the compiler more so than other languages. What could it be possible to screw up with top-level members that isn't possible to screw up without them? And aside from that, what utility would there be in keeping a script separate? Top level code isn't much of a sticking point, void Main() is easy to rationalize, but there's no reason you shouldn't be able to define any function (including Main) outside of a class.

@orthoxerox
Copy link

If we want to provide a smooth ramp-up for language learners, then I can envision the following approach:

  • top level statements, single file
  • top level statements, local functions, single file
  • Main, local functions, type declarations, load instructions, multiple files

That is, as soon as you want to declare a type or include a file, you need Main.

@gulshan
Copy link

gulshan commented Sep 1, 2019

Personally I like the idea of top-level functions and top-level const members. I think, top-level variables should be avoided. Also, instead of top level statements, I prefer a specific entry point or Main function. Which means, the C# script will still be a separate dialect, with special syntax for top-level statements as well as referencing other scripts and assemblies (and separate file extension .csx). But then C# proper will be much more aligned with C# script. I think this is similar to the path chosen by F# and Kotlin.

@ghord
Copy link

ghord commented Sep 1, 2019

How about in addition to disallowing multiple files with top-level statements we also treat such statements as an implicit Main function? This way if user declares seperate Main, it will error out because of multiple declarations.

I think top level function declarations will be fine spread across multiple files.

@yaakov-h
Copy link
Member

yaakov-h commented Sep 1, 2019

Something else that might be useful for scripting would be treating the first line of a file as trivia if it starts with #!.

@svick
Copy link
Contributor

svick commented Sep 2, 2019

@yaakov-h That's already supported in the current scripting dialect.

@Thaina
Copy link

Thaina commented Sep 3, 2019

I wish we could write a top level function and member. But personally I don't like a top level statement. It would be ambiguous for which one will execute first in the project. While we could put one void Main() in any files and it will cause name collide error. It could make what should be single for the project be exactly single

Top level statement should be limit to csx

@MadsTorgersen MadsTorgersen modified the milestones: 10.0 candidate, 8.0 candidate, 9.0 candidate Sep 11, 2019
@tmat
Copy link
Member

tmat commented Sep 18, 2019

do away with the separate scripting dialect

@MadsTorgersen I don't think this is desirable. In fact, we do have 3 dialects of C# language already, not two. We have C# proper, C# script and C# debug expressions. That last one is used in Expression Evaluators (Watch and Immediate windows). It has additional syntax and semantics that is implemented via customized binders.

I believe we can unify C# script with C# debug expressions into a single dialect, at least on syntactic level, by bringing some of the C# debugging syntax extensions to C# script. For example, the C# debugging dialect allows for special identifiers starting with $ - $1, $2, ..., $exception, etc. These refer to special values exposed by the debugger. In C# interactive these would be used to access host defined variables, if we allowed such syntax. Another example is that a debug expression may be suffixed with a formatter specifier. E.g. 1+1, h - h here is an identifier that specifies that the result should be formatted as hexadecimal number when printed out. This would indeed be very useful in Interactive Window, especially if the host can define custom named visualizers. Ultimately, there is no reason why the syntax in C# Interactive Window should differ from the Immediate Window experience. The semantics of some constructs are necessarily different though. For example, local variables defined in Immediate Window are emitted as real local variables (because the debugger can do magic with an IL interpreter), while in Interactive Window they are emitted as fields (because we need to access the value in subsequent submissions). This imposes some limitations on the types of these variables - they can't use stack-only types.

Re unification of C# proper and C# scripting: There are C# scripting features that wouldn't make much sense to merge into C# proper. For example, C# proper uses projects to list metadata references and all source files that are part of the compilation. Script C# uses #r and #load directives. It doesn't make much sense to complicate C# proper by adding #r and #load.

So, I am very skeptical that we will be able to unify all these three dialects into one. That said, I think we can bring them closer together.

@SamPruden
Copy link

@HaloFour raised the issue of whether C# script is really used much.

But to be honest, is it worth doing at all? Is there actually that much (or any) usage of C# script, enough to warrant that C# adopt it's syntax and idioms?

The main way I've used C# scripting is with Dave Glick's Scripty, which is a tool that allows C# scripts to act as an alternative to T4 templates. I find it much better to use than T4, and would actually like to see something like it become part of the compiler. Perhaps that's relevant to #12505. Templating and other build tools seem like a great usage for scripts.

I like the idea of making this a native feature, but I think it should be disabled by default in each file unless explicitly enabled, either by a file-extension or pragma.

@jonsequitur
Copy link

Another example is that a debug expression may be suffixed with a formatter specifier. E.g. 1+1, h - h here is an identifier that specifies that the result should be formatted as hexadecimal number when printed out. This would indeed be very useful in Interactive Window, especially if the host can define custom named visualizers.

We have API-level support for this kind of thing in the .NET kernel. It can be used by the host or by an extension author. It's very customizable and flexible enough to support different output types, e.g. plain text vs. HTML. This is an approach to custom output formatting for interactive coding that doesn't require language-level support.

@tmat
Copy link
Member

tmat commented Oct 4, 2019

This is an approach to custom output formatting for interactive coding that doesn't require language-level support.

Sure, you can always call some pre-defined function that converts the result of the expression to some type that have a custom visualizer. However, the EE is already doing this via a pseudo-language feature (<expr>, <idf>) and customers are used to it. By adding this to interactive C# we'd unify the experience across debugging and interactive.

@jonsequitur
Copy link

However, the EE is already doing this via a pseudo-language feature (, ) and customers are used to it.

Can you provide more details on this? I'm not familiar with it.

@tmat
Copy link
Member

tmat commented Oct 5, 2019

@redradist
Copy link

redradist commented Oct 13, 2019

Thank you @MadsTorgersen for this proposal written by you, because on github during long time was the fight for top-level statements and top-level functions !!

I personally wrote proposal for top level function (#2156), but faced with huge pressure from some stubborn group of people ... :(

I like this idea, because in this case it would be possible to use C# as replacement for Python and we will have the best from two worlds !!

Again thank you @MadsTorgersen !!

@jiggyswift
Copy link

@MadsTorgersen This is a fantastic suggestion and would both be useful, and help gain the interest of a new generation in C#. I have two children (8 and 11) and would seriously consider introducing them to C# if this was implemented (instead of say Python or whatever they are learning at code club). It is just too unreasonable to expect children this age to grasp the large project benefits of formal object oriented programming and understand why the default template for a line of code starts like namespace ConsoleApp { class Program { static void Main(string[] args) { // My Code Goes Here} } }. Even the indentation alone kills this for small 5-10 line programs. The existing complexity is a great way to drive children (or any new programmers) away from the language and towards other languages that have a lower barrier to entry.

I have noted some commentators consider that usage of the C# scripting dialect has been relatively minor (and thus it might not be worth implementing this feature). This view is misguided as we have a chicken and egg problem. I have been programming in C# since the first alpha 19 years ago, and have always wanted something like this for 1-10 line programs. But other than "playing", I have never used the scripting dialects because:

  1. They were never a first-class part of the language and I didn't trust they would continue to be supported; plus they needed separate commands or environments to run the scripts, etc).
  2. I couldn't cut and paste any of that code into a normal C# project without "fixing things".

The simplest implementation, but least flexible, would be to merely allow this syntax for the content of the Main method, such as shown below (everything after the using):

using System;
string Greeting(string name) => $"Hello {name}";
void Print(string msg) { Console.WriteLine(msg); }
Print($"{Greeting("David")}. Starting program to print HTML");
var client = new System.Net.Http.HttpClient();
var page = await client.GetStringAsync("https://microsoft.com");
Print(page);

In this example "Greeting" and "Print" are not members of the class, but just defined inside the emitted Main method (thanks for these recent additions to C#). The class name could simply be the name of the file, so MyApp.cs would generate a class MyApp.

Some people have commented about concerns of clashing with a previously declared Main method in the project. But how is that a problem? The compiler should just generate the error message it would already generate today (shown below):

  • Program has more than one entry point defined. Compile with /main to specify the type that contains the entry point.

I don't think you should add the script specific ways of referencing assemblies (and other similar extensions). The scripting environments can keep doing whatever they are doing in this respect. Remember there have always been various ways to reference assemblies. For example with csc.exe you might use /reference; or they can be referenced in a project file that is used by dotnet build.

One request would be, can you make the entry point asynchronous, such as:
async static Task Main(string[] args)

Which leads to the next question: Should we, by convention, allow access to "args". I think you probably should to make this useful.

Finally, I am not sure if you by default include any usings like using System; but it is fine if you require these to be explicitly stated.

This is exciting @MadsTorgersen and now we need a concrete proposal.

@Grauenwolf
Copy link

Grauenwolf commented Jan 13, 2020

This seems like a feature more appropriate for Visual Basic, which tends to attract learners and casual programmers.

And speaking of which, is VB as a language still being developed? Or is it just in maintenance mode for legacy applications that need .NET Core support?

@Grauenwolf
Copy link

Which leads to the next question: Should we, by convention, allow access to "args". I think you probably should to make this useful.

Might be better to have people use Environment.CommandLine.

@tonygiang
Copy link

I share the concerns about naming conflicts, but as long as C# remains a compiled language, I don't see a huge problem. This proposal would result in a huge reduction in boilerplate code in many cases. The C# compiler already aggressively checked for class name conflicts from different namespaces.
Speaking of which, we already had namespaces in case a top-level free function absolutely has to keep its name to make semantic sense. The syntax to call a free function inside a namespace may have to differ from a public static class function but lack of syntax idea has never been a problem.

@MadsTorgersen
Copy link
Contributor Author

More detailed proposal: #3117

@jnm2
Copy link
Contributor

jnm2 commented Feb 2, 2020

Which leads to the next question: Should we, by convention, allow access to "args". I think you probably should to make this useful.

Might be better to have people use Environment.CommandLine.

On Windows, following the standard argv parsing convention is hard. https://docs.microsoft.com/en-us/archive/blogs/twistylittlepassagesallalike/everyone-quotes-command-line-arguments-the-wrong-way. Providing string[] rather than string is a starting point that every app will need.

On other OSs, arguments are not passed between processes as a single string the way Windows does it. An array of separate strings is passed. It wouldn't make sense to join them into a single string.

@Grauenwolf
Copy link

It would be trivial to add Environment.CommandLineArgs

@tmat
Copy link
Member

tmat commented Feb 3, 2020

@Grauenwolf This wouldn't work for script hosts. The host needs to be able to pass customized arguments to the script.

csi /u:System.Runtime.CompilerServices script.csx scriptArg1 scriptArg2

script.csx:

Print(Args);
Print(Environment.CommandLine);
List<string>(2) { "scriptArg1", "scriptArg2" }
"csi /u:System.Runtime.CompilerServices script.csx scriptArg1 scriptArg2"

It's true that C# proper could behave differently, but then it might get confusing.

@Grauenwolf
Copy link

Environment.CommandLineArgs doesn't exist yet. You can put whatever you want in it.

@tmat
Copy link
Member

tmat commented Feb 3, 2020

@Grauenwolf Sure, but if you make the content different from Environment.CommandLine it would also be confusing. To make them consistent the script host would need to be able to set the underlying values of both, which has other problems. It'd be better to just have a different API.

@Grauenwolf
Copy link

Well I feel foolish, we already have GetCommandLineArgs.

https://docs.microsoft.com/en-us/dotnet/api/system.environment.getcommandlineargs?view=netframework-4.8

You just got to know which items to ignore from the array.

@redradist
Copy link

redradist commented May 12, 2020

Hi all, @MadsTorgersen

I have read the proposal https://github.com/dotnet/csharplang/blob/master/proposals/top-level-statements.md and I have one suggestion:

Do not allow user to use top-level return statement, because it is very confusing for programmer !!

await System.Threading.Tasks.Task.Delay(1000);
System.Console.WriteLine("Hi!");
return 0;

This example confusing ...
It would be better if compiler generates for such code:

await System.Threading.Tasks.Task.Delay(1000);
System.Console.WriteLine("Hi!");

the following:

static class $Program
{
    static async Task<int> $Main(string[] args)
    {
        await System.Threading.Tasks.Task.Delay(1000);
        System.Console.WriteLine("Hi!");
        return 0; // This line added implicitly
    }
}

If user want to exit from program in top-level statement:

await System.Threading.Tasks.Task.Delay(1000);
System.Console.WriteLine("Hi!");
Environment.Exit(1);

will yield the following code:

static class $Program
{
    static async Task<int> $Main(string[] args)
    {
        await System.Threading.Tasks.Task.Delay(1000);
        System.Console.WriteLine("Hi!");
        Environment.Exit(1);
        return 0; // This line added implicitly
    }
}

or after optimization:

static class $Program
{
    static async Task<int> $Main(string[] args)
    {
        await System.Threading.Tasks.Task.Delay(1000);
        System.Console.WriteLine("Hi!");
        return 1; // This line added instead of Environment.Exit(1) statement
    }
}

@SolalPirelli
Copy link

This looks awesome from a teaching perspective, since "public static void Main" currently has to be treated like a magic incantation for programs by beginners.

Why not allow class definitions between top-level statements, though?
This constraint, AFAIK, is not present elsewhere in the language: one is free to declare classes before or after anything else without changing the program's meaning. It means there is still an irregularity that must be explicitly taught to language (or programming in general) beginners. :(

@amdav
Copy link

amdav commented May 21, 2020

It doesn't mean it is an irregularity. It means top level statements are exactly the same as if you put the same code inside a public static void Main. You could have a simple refactor that moved it inside.

@SolalPirelli
Copy link

SolalPirelli commented May 22, 2020

If top-level statements have to be explained in terms of what public static void Main is, that makes them a lot less useful.

IMHO it should ideally be the opposite: first teach people about basic expressions in a REPL, then move to a file with top-level statements, teach them about adding classes to that file (anywhere!), then finally explain that for real projects it makes more sense to contain those statements in special method named Main (using the knowledge that classes exist)

@amdav
Copy link

amdav commented May 22, 2020

::::then finally explain that for real projects it makes more sense to contain those statements in special method named Main
But that is the problem. You cannot "just move those statements into a special method Main" the way you are thinking, because you cannot include a class definition inside "Main", or were you going to move them inside the "Program" class, in which case they would be inner classes, etc. I personally prefer this to be a simple and predictable feature, so like what the team is doing. But I understand this is something where "reasonable" people will have different opinions. Some people will want this feature to make C# into a REPL/Scripting language....and I get that....I just personally agree with everything the team has presented.

@jnm2
Copy link
Contributor

jnm2 commented May 22, 2020

And then there's me, who asked to be able to define a class within a method a long time ago :D

@redradist
Copy link

redradist commented May 28, 2020

@MadsTorgersen
Also I think if there are few files with top-level statements:
Program.cs

using System;

Console.WriteLine("Hello World!");

and

SomeFile.cs

using System;

Console.WriteLine("Hello World!");

Compiler should generate the following code:
For Program.cs

using System;

class $Program
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello World!");
     }
}

For SomeFile.cs

using System;

class $SomeFile
{
    static void Main(string[] args)
    {
        Console.WriteLine("Hello World!");
     }
}

It should be possible to specify StartUpObject/StartUpFile:

<StartupFile>Program.cs</StartupFile>

@jcouv jcouv changed the title Top level statements and member declarations (embrace scripting dialect) Top level statements and member declarations (embrace scripting dialect) (VS 16.8, .NET 5) Sep 1, 2020
@MadsTorgersen MadsTorgersen modified the milestones: 9.0 candidate, 9.0 Sep 9, 2020
@333fred 333fred added the Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification label Oct 16, 2020
@dotnet dotnet locked as resolved and limited conversation to collaborators Dec 12, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Design Review Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion
Projects
None yet
Development

No branches or pull requests