Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Proposal]: Auto-default structs #5737

Open
RikkiGibson opened this issue Feb 9, 2022 · 27 comments
Open

[Proposal]: Auto-default structs #5737

RikkiGibson opened this issue Feb 9, 2022 · 27 comments
Assignees
Labels
Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion Proposal
Milestone

Comments

@RikkiGibson
Copy link
Contributor

RikkiGibson commented Feb 9, 2022

This proposal is raised as a possible mitigation for usability issues found in #5552 and #5635, as well as addressing #5563 (all fields must be definitely assigned, but field is not accessible within the constructor).

Speclet: https://github.com/dotnet/csharplang/blob/main/proposals/csharp-11.0/auto-default-structs.md


Since C# 1.0, struct constructors have been required to definitely assign this as if it were an out parameter.

public struct S
{
    public int x, y;
    public S() // error: Fields 'S.x' and 'S.y' must be fully assigned before control is returned to the caller
    {
    }
}

This presents issues when setters are manually defined on semi-auto properties, since the compiler can't treat assignment of the property as equivalent to assignment of the backing field.

public struct S
{
    public int X { get => field; set => field = value; }
    public S() // error: struct fields aren't fully assigned. But caller can only assign 'this.field' by assigning 'this'.
    {
    }
}

We assume that introducing finer-grained restrictions for setters, such as a scheme where the setter doesn't take ref this but rather takes out field as a parameter, is going to be too niche and incomplete for some use cases.

One fundamental tension we are struggling with is that when struct properties have manually implemented setters, users often have to do some form of "repetition" of either repeatedly assigning or repeating their logic:

struct S
{
    private int _x;
    public int X
    {
        get => _x;
        set => _x = value >= 0 ? value : throw new ArgumentOutOfRangeException();
    }

    // Solution 1: assign some value in the constructor before "really" assigning through the property setter.
    public S(int x)
    {
        _x = default;
        X = x;
    }

    // Solution 2: assign the field once in the constructor, repeating the implementation of the setter.
    public S(int x)
    {
        _x = x >= 0 ? x : throw new ArgumentOutOfRangeException();
    }
}

Previous discussion

A small group has looked at this issue and considered a few possible solutions:

  1. Require users to assign this = default when semi-auto properties have manually implemented setters. We agree this is the wrong solution since it blows away values set in field initializers.
  2. Implicitly initialize all backing fields of auto/semi-auto properties.
    • This solves the "semi-auto property setters" problem, and it squarely places explicitly declared fields under different rules: "don't implicitly initialize my fields, but do implicitly initialize my auto-properties."
  3. Provide a way to assign the backing field of a semi-auto property and require users to assign it.
    • This could be cumbersome compared to (2). An auto property is supposed to be "automatic", and perhaps that includes "automatic" initialization of the field. It could introduce confusion as to when the underlying field is being assigned by an assignment to the property, and when the property setter is being called.

We've also received feedback from users who want to, for example, include a few field initializers in structs without having to explicitly assign everything. We can solve this issue as well as the "semi-auto property with manually implemented setter" issue at the same time.

struct MagnitudeVector3d
{
    double X, Y, Z;
    double Magnitude = 1;
    public MagnitudeVector3d() // error: must assign 'X', 'Y', 'Z' before returning
    {
    }
}

Adjusting definite assignment

Instead of performing a definite assignment analysis to give errors for unassigned fields on this, we do it to determine which fields need to be initialized implicitly. Such initialization is inserted at the beginning of the constructor.

struct S
{
    int x, y;

    // Example 1
    public S()
    {
        // ok. Compiler inserts an assignment of `this = default`.
    }

    // Example 2
    public S()
    {
        // ok. Compiler inserts an assignment of `y = default`.
        x = 1;
    }

    // Example 3
    public S()
    {
        // valid since C# 1.0. Compiler inserts no implicit assignments.
        x = 1;
        y = 2;
    }

    // Example 4
    public S(bool b)
    {
        // ok. Compiler inserts assignment of `this = default`.
        if (b)
            x = 1;
        else
            y = 2;
    }

    // Example 5
    void M() { }
    public S(bool b)
    {
        // ok. Compiler inserts assignment of `y = default`.
        x = 1;
        if (b)
            M();

        y = 2;
    }
}

In examples (4) and (5), the resulting codegen sometimes has "double assignments" of fields. This is generally fine, but for users who are concerned with such double assignments, we can emit what used to be definite assignment error diagnostics as disabled-by-default warning diagnostics.

struct S
{
    int x;
    public S() // warning: 'S.x' is implicitly initialized to 'default'.
    {
    }
}

Users who set the severity of this diagnostic to "error" will opt in to the pre-C# 11 behavior. Such users are essentially "shut out" of semi-auto properties with manually implemented setters.

struct S
{
    public int X
    {
        get => field;
        set => field = field < value ? value : field;
    }

    public S() // error: backing field of 'S.X' is implicitly initialized to 'default'.
    {
        X = 1;
    }
}

At first glance, this feels like a "hole" in the feature, but it's actually the right thing to do. By enabling the diagnostic, the user is telling us that they don't want the compiler to implicitly initialize their fields in the constructor. There's no way to avoid the implicit initialization here, so the solution for them is to use a different way of initializing the field than a manually implemented setter, such as manually declaring the field and assigning it, or by including a field initializer.

Currently, the JIT does not eliminate dead stores through refs, which means that these implicit initializations do have a real cost. But that might be fixable. dotnet/runtime#13727

It's worth noting that initializing individual fields instead of the entire instance is really just an optimization. The compiler should probably be free to implement whatever heuristic it wants, as long as it meets the invariant that fields which are not definitely assigned at all return points or before any non-field member access of 'this' are implicitly initialized.

For example, if a struct has 100 fields, and just one of them is explicitly initialized, it might make more sense to do an initobj on the entire thing, than to implicitly emit initobj for the 99 other fields. However, an implementation which implicitly emits initobj for the 99 other fields would still be valid.

Changes to language specification

We adjust the following section of the standard:

https://github.com/dotnet/csharpstandard/blob/draft-v6/standard/expressions.md#11712-this-access

If the constructor declaration has no constructor initializer, the this variable behaves exactly the same as an out parameter of the struct type. In particular, this means that the variable shall be definitely assigned in every execution path of the instance constructor.

We adjust this language to read:

If the constructor declaration has no constructor initializer, the this variable behaves similarly to an out parameter of the struct type, except that it is not an error when the definite assignment requirements (§9.4.1) are not met. Instead, we introduce the following behaviors:

  1. When the this variable itself does not meet the requirements, then all unassigned instance variables within this at all points where requirements are violated are implicitly initialized to the default value (§9.3) in an initialization phase before any other code in the constructor runs.
  2. When an instance variable v within this does not meet the requirements, or any instance variable at any level of nesting within v does not meet the requirements, then v is implicitly initialized to the default value in an initialization phase before any other code in the constructor runs.

Design meetings

https://github.com/dotnet/csharplang/blob/main/meetings/2022/LDM-2022-02-14.md#definite-assignment-in-structs

@TahirAhmadov
Copy link

Aren't we going to have the same questions for classes and semi-auto props and fields WRT nullability analysis?

@RikkiGibson
Copy link
Contributor Author

Yes, I wrote a bit about nullable analysis of semi-auto properties but decided to remove it from here and raise it separately, since it seems like it doesn't really depend on the containing type kind.

@RikkiGibson
Copy link
Contributor Author

Regarding the conclusion from LDM:

Proposal is accepted, contingent on following up with emeritus LDT members on original reasoning around struct definite assignment.

We have followed up. We didn't find any new reasons, beyond what we've already considered, that the 'out this' behavior was chosen originally for struct constructors.

@oising

This comment was marked as off-topic.

@333fred

This comment was marked as off-topic.

@oising

This comment was marked as off-topic.

@333fred

This comment was marked as off-topic.

@oising

This comment was marked as off-topic.

@HaloFour

This comment was marked as off-topic.

@RikkiGibson

This comment was marked as off-topic.

@jnm2

This comment was marked as off-topic.

@CyrusNajmabadi

This comment was marked as off-topic.

@RikkiGibson
Copy link
Contributor Author

We recently explored the interaction between this feature and dotnet/roslyn#30194. An example of the interaction:

// ClassLibrary1.csproj
public struct S1
{
    private object x;
}

// ConsoleApp1.csproj (references ClassLibrary1.csproj)
public struct S2
{   
    public S1 s1;

    // assume all scenarios use the same compiler here.
    // C# 8: no diagnostics. implicit 's1 = default'.
    // C# 9: LangVersion warning diagnostic. implicit 's1 = default'.
    // C# 11: disabled-by-default warning diagnostic. implicit 's1 = default'.
    public S2(bool unused) { }
}

Our conclusions are:

  • The same emit is used in all language versions. This means that when the user updates their compiler version, the emit of existing code could change to add implicit default field assignments, where previously the user may have been using uninitialized memory. We think this is OK to do in all language versions because the behavior of using uninitialized memory is not defined, and this emit change simply moves the user from using an undefined behavior to a defined behavior when they update the compiler version.
  • If an old compiler gives a definite-assignment warning in some scenario, and the new compiler with preview LangVersion does not give that same warning, then the new compiler with old LangVersion should give a LangVersion warning.
  • Similarly: If an old compiler gives a definite-assignment error in some scenario, and the new compiler with preview LangVersion does not give that same error, then the new compiler with old LangVersion should give a LangVersion error.

@KathleenDollard
Copy link
Collaborator

It looks like we are preserving the ability of a team to have different members on different compiler versions avoid breaking each other. So, I think this is a good place to land on this.

@RikkiGibson
Copy link
Contributor Author

This feature will be available in preview in VS 17.3.

@HopefulFrog
Copy link

This is cool. I never really understood why structs had the explicit assignment requirement in the first place, though I may have read about it sometime in the past and just forgot.

@Youssef1313
Copy link
Member

Youssef1313 commented Apr 13, 2022

@HopefulFrog Because:

var x = new S();
x.F = 10;
x = new S();
Console.WriteLine(x.F); // will print 10 if F isn't initialized in struct constructor.

struct S
{
    public S() { }

    public int F;
}

Now, with this proposal implemented, the compiler will add F = 0 in the constructor for you (unless you already have F = something in the constructor in an always-reachable code path)

@HopefulFrog
Copy link

@HopefulFrog Because:

var x = new S();
x.F = 10;
x = new S();
Console.WriteLine(x.F); // will print 10 if F isn't initialized in struct constructor.

struct S
{
    public S() { }

    public int F;
}

Now, with this proposal implemented, the compiler will add F = 0 in the constructor for you (unless you already have F = something in the constructor in an always-reachable code path)

I'm not sure I understand your example. Are you saying that with structs, a new call somehow reuses the variable on the left side of the assignment? My assumption would have been that the second call of new S(); creates an entirely new instance that then overwrites the original one, meaning the value of F would no longer be 10, but would be default(int), if we're assuming that a value type that hasn't been explicitly assigned has its default value.

@Youssef1313
Copy link
Member

Youssef1313 commented Apr 13, 2022

My assumption would have been that the second call of new S(); creates an entirely new instance that then overwrites the original one, meaning the value of F would no longer be 10,

By default, that's not correct. The compiler now explicitly assign fields to their default values so that you see the behavior you want. Meaning the compiler adds some code so that F would no longer be 10.

To clarify more, for the example above, here is the IL of the struct constructor per SharpLab:

    .method public hidebysig specialname rtspecialname 
        instance void .ctor () cil managed 
    {
        // Method begins at RVA 0x2087
        // Code size 8 (0x8)
        .maxstack 8

        IL_0000: ldarg.0
        IL_0001: ldc.i4.0
        IL_0002: stfld int32 S::F
        IL_0007: ret
    } // end of method S::.ctor

Note that if the compiler didn't produce the instructions to assign 0 to F, then the code above will print 10.

@RikkiGibson
Copy link
Contributor Author

Yeah, structs are weird. The constructor takes an 'out' ref to a variable as an argument and the compiler is free to use refs to uninitialized variables as arguments or to reuse variables that contained something else before as arguments.

@HopefulFrog
Copy link

I had no idea. So, what happens in the constructor if it's not part of an assignment, like if your statement is just new S();?

@RikkiGibson
Copy link
Contributor Author

The compiler creates a temporary variable.

@paulozemek
Copy link

What is the order of initialization of those fields?
In fact, my question is about an odd situation that we have right now... if, for example, I create an ARGB struct using [FieldOffset], I can have something similar to a union, where I might have an intValue at offset 0 and the fields A, R, G, B from offsets 0 to 3.
So, setting just the intValue should naturally set all the fields... but the compiler complains that not all fields were set. If the compiler "auto-set" them to 0, but after intValue was set, we will be introducing a bug there.

@PathogenDavid
Copy link

@paulozemek The default assignment occurs before your intValue assignment:

2. When an instance variable v within this does not meet the requirements, or any instance variable at any level of nesting within v does not meet the requirements, then v is implicitly initialized to the default value in an initialization phase before any other code in the constructor runs.

(Emphasis mine)


As an aside:

but the compiler complains that not all fields were set.

In cases like this you can use Unsafe.SkipInit to skip initializing A/R/G/B since as a human you know they were all set when you assigned intValue.

@paulozemek
Copy link

@paulozemek The default assignment occurs before your intValue assignment:

  1. When an instance variable v within this does not meet the requirements, or any instance variable at any level of nesting within v does not meet the requirements, then v is implicitly initialized to the default value in an initialization phase before any other code in the constructor runs.

(Emphasis mine)

As an aside:

but the compiler complains that not all fields were set.

In cases like this you can use Unsafe.SkipInit to skip initializing A/R/G/B since as a human you know they were all set when you assigned intValue.

Thanks! It would be nice if the compiler identified that those bytes were already set and didn't even try to initialize them first... but there is a valid solution, so this is great.

@jcouv jcouv added this to the 11.0 milestone Sep 26, 2022
@333fred 333fred added the Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification label Jan 9, 2023
@IS4Code
Copy link

IS4Code commented Mar 3, 2023

I don't really see much of a reason or benefit in this. I have been saved by the kind of error this seems to want to remove a considerable number of times in the past, when I actually forgot to assign a member of a struct. Why have this when the go-to solution in such a case has always been : this()? Or is there a particular reason why the following shouldn't work?

public struct S
{
    public int X { get => field; set => field = value; }
    public S() : this()
    {
    }
}

Sure, you might have a parameterless constructor, but then a syntax like : default could be introduced instead.

@jamesford42
Copy link

This sounds great! And would eliminate one of the big annoying differences between defining a type as a struct versus a class.

@jcouv jcouv added the Proposal label Sep 17, 2024
@dotnet dotnet locked as resolved and limited conversation to collaborators Nov 22, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Implemented Needs ECMA Spec This feature has been implemented in C#, but still needs to be merged into the ECMA specification Proposal champion Proposal
Projects
None yet
Development

No branches or pull requests