Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Avoid boxing in string concatenation #35006

Merged
merged 12 commits into from
Jul 15, 2019

Conversation

canton7
Copy link
Contributor

@canton7 canton7 commented Apr 16, 2019

This changes the order of evaluation for string concatenation to follow
the C# specification. See #522.

Previously, when the compiler lowered a string concatenation containing
any objects which were not strings (or implicitly convertible to
strings), it would generate a call to an appropriate overload of
string.Concat taking objects. This meant that all expressions being
concatenated were evaluated, then string.Concat called, which called
.ToString() on all of them.

E.g.

"x" + y + z

got compiled to :

string.Concat("x", y, z)

Here, y and z are evaluated to object (boxing them if necessary),
and then they have .ToString() called on them by string.Concat.

This is technically against the C# specification, which defines string
concatenation as a sequence of left-associative calls to one of:

string operator +(string x, string y);
string operator +(string x, object y);
string operator +(object x, string y);

So the above expression should conceptually be compiled to:

op_Addition(op_Addition("x", y), z)

That is, y is evaluated to object and concatenated with "x"
before z is evaluated to object.

The decision was made to change this behaviour in #522.

This change brings a very important advantage: we can now fully evaluate
y to a string before we evaluate z, which means we're allowed to
compile the above to:

string.Concat("x", y.ToString(), z.ToString())

This means that:

  • We don't box y and z if they're value types
  • We don't call ToString() on "x" (which is harmless but pointless)

This commit implements this change.

Calls are no longer emitted to overloads of string.Concat which take
object - instead, all parameters are evaluated to a string, and then
an appropriate overload of string.Concat which takes strings is
called.

Care is taken to deal with different types of expressions in different
ways:

  • Strings or types which are implicitly convertible to string never have
    .ToString called on them
  • Reference types (and unconstrained generics) have ?.ToString() called
    on them
  • A copy is made of non-readonly non-constant value types (and
    unconstrained generics), before.ToString() is called on the copy. This
    is to preserve the previous behaviour where value types were boxed, and
    so any side-effects of ToString() weren't reflected in the original.
  • A copy isn't made of readonly value types, as we know that ToString()
    can't have side-effects.

We emit call to special struct types, as we know these have their own
ToString implementation which won't be removed. For other value types,
we emit a constrained virtual call. This follows the precedent set in

For unary concatenation (e.g. thing + "") we now emit
thing?.ToString() ?? "" instead of string.Concat(thing) ?? "".

This does increase the size of generated code. My impression is that the
unit tests which were affected by this changed increased in size by
about 10%. This is somewhat offset by the fact that we can now call

string.Concat(string, string, string, string)

instead of

string.Concat(object[])

(since there is no overload of string.Concat which takes 4 objects),
which reduces the size of some methods significantly.

I haven't removed the SpecialMember versions of string.Concat which take object - let me know if you want me to.

@canton7 canton7 force-pushed the feature/string-concat-order branch from 3f233de to f00e6b7 Compare April 16, 2019 12:57
@@ -3492,6 +3492,7 @@ void M(string[] args)
{
// Code size 7 (0x7)
.maxstack 1
.locals init (int V_0)
Copy link
Contributor Author

@canton7 canton7 Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm assuming these are harmless? They're present in most of the other tests in this file - I'm not sure why my change triggered this. #Resolved

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect what happened here is that you transformed x + y + z into something with a temporary variable, which then got dropped (i.e. left unused) in a subsequent optimization/transformation. It is a bit worrying, but not incorrect.


In reply to: 275783989 [](ancestors = 275783989)

@canton7 canton7 force-pushed the feature/string-concat-order branch from f00e6b7 to b00bd1f Compare April 16, 2019 13:26
@@ -2260,33 +2252,6 @@ struct X
);
}

[Fact]
public void System_String__ConcatObject()
Copy link
Contributor Author

@canton7 canton7 Apr 16, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We genuinely never emit this any more, so we can't trigger this diagnostic.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe please add the test back with whatever actual behavior it displays (no errors) verified, and also check that it does the right thing at runtime.


In reply to: 275796563 [](ancestors = 275796563)

@canton7 canton7 force-pushed the feature/string-concat-order branch 2 times, most recently from f06b1b8 to 1c08ab7 Compare April 16, 2019 13:38
canton7 added 2 commits April 16, 2019 14:59
This changes the order of evaluation for string concatenation to follow
the C# specification.

Previously, when the compiler lowered a string concatenation containing
any objects which were not strings (or implicitly convertible to
strings), it would generate a call to an appropriate overload of
`string.Concat` taking `objects`. This meant that all expressions being
concatenated were evaluated, then `string.Concat` called, which called
`.ToString()` on all of them.

E.g.

    "x" + y + z

got compiled to :

    string.Concat("x", y, z)

Here, `y` and `z` are evaluated to `object` (boxing them if necessary),
and then they have `.ToString()` called on them by `string.Concat`.

This is technically against the C# specification, which defines string
concatenation as a sequence of left-associative calls to one of:

    string operator +(string x, string y);
    string operator +(string x, object y);
    string operator +(object x, string y);

So the above expression should conceptually be compiled to:

    op_Addition(op_Addition("x", y), z)

That is, `y` is evaluated to `object` and concatenated with `"x"`
before `z` is evaluated to `object`.

The decision was made to change this behaviour in dotnet#522.

This change brings a very important advantage: we can now fully evaluate
`y` to a string before we evaluate `z`, which means we're allowed to
compile the above to:

    string.Concat("x", y.ToString(), z.ToString())

This means that:

 - We don't box `y` and `z` if they're value types
 - We don't call `ToString()` on `"x"` (which is harmless but pointless)

This commit implements this change.

Calls are no longer emitted to overloads of `string.Concat` which take
`object` - instead, all parameters are evaluated to a string, and then
an appropriate overload of `string.Concat` which takes `strings` is
called.

Care is taken to deal with different types of expressions in different
ways:

 - Strings or types which are implicitly convertible to string never have
   .ToString called on them
 - Reference types (and unconstrained generics) have ?.ToString() called
   on them
 - A copy is made of non-readonly non-constant value types (and
   unconstrained generics), before.ToString() is called on the copy. This
   is to preserve the previous behaviour where value types were boxed, and
   so any side-effects of ToString() weren't reflected in the original.
 - A copy isn't made of readonly value types, as we know that ToString()
   can't have side-effects.

We emit `call` to special struct types, as we know these have their own
ToString implementation which won't be removed. For other value types,
we emit a constrained virtual call. This follows the precedent set in

For unary concatenation (e.g. `thing + ""`) we now emit
`thing?.ToString() ?? ""` instead of `string.Concat(thing) ?? ""`.

This does increase the size of generated code. My impression is that the
unit tests which were affected by this changed increased in size by
about 10%. This is somewhat offset by the fact that we can now call

    string.Concat(string, string, string, string)

instead of

    string.Concat(object[])

(since there is no overload of string.Concat which takes 4 objects),
which reduces the size of some methods significantly.

Fixes dotnet#522
This excludes `string.Concat(object, object)` as that's emitted inside
expressions.

After the previous commit, these weren't being used.

This commit does not remove these special members entirely, although
they're currently not referenced from anywhere else.

This change means that we will no longer recognise and unpack a call to
an overload of `string.Concat` which takes `objects`. Previously if a
user compiled:

    string.Concat("x", y) + z

(where `y` and `z` are not convertible to string) we would compile it to:

    string.Concat("x", y, z)

This will no longer happen, instead we will output:

    string.Concat(string.Concat("x", y), z)

This is probably for the best. We've made a decision to change the
semantics of string concatenation to be in line with the specification,
and rewriting this case would go against the spec.
@canton7 canton7 force-pushed the feature/string-concat-order branch from 1c08ab7 to 089d0d1 Compare April 16, 2019 14:00
@canton7 canton7 marked this pull request as ready for review April 16, 2019 16:00
@canton7 canton7 requested a review from a team as a code owner April 16, 2019 16:00
@gafter gafter added Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee. labels Apr 16, 2019
@gafter
Copy link
Member

gafter commented Apr 16, 2019

@canton7 I don't think the description of this PR is correct. That looks like the description of your previous PR. #Resolved

@canton7
Copy link
Contributor Author

canton7 commented Apr 16, 2019

@gafter My previous PR was about string.Concat(object) returning null if an object's ToString method returns null.

This one changes the semantics of string concatenation so that each parameter is evaluated to an object and then converted to a string in turn, rather than all of them being evaluated to objects, then all converted to strings. We do this by calling .ToString() on each parameter in turn, and then calling a string.Concat(string, ...) overload, which also avoid boxing value types.

This PR undoes a lot of the complexity from my previous PR - I wasn't aware of #522 when I wrote that!

If you think the title is misleading, I won't complain if you change it 😄 #Resolved

@gafter
Copy link
Member

gafter commented Apr 16, 2019

@canton7 No, you're right, the description does describe what this PR does. I didn't look closely enough. #Resolved

@gafter gafter self-assigned this Apr 16, 2019
}

int currentConditionalAccessID = ++_currentConditionalAccessID;
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I saw one place in code where this was ++_currentConditionalAccessID, and another where it was _currentConditionalAccessID++. I'm assuming that the second is buggy (although I tried and failed to trigger it), and copied the pattern from the first.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Anyone able to comment about why the increments vary in this way in different locations?

Copy link
Contributor Author

@canton7 canton7 Apr 25, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, but can you please change it so that they are all preincrement?


In reply to: 278340949 [](ancestors = 278340949)

….ToString

From review comments.

Readonly methods have landed, so we don't need to create a copy of a
struct when calling a readonly ToString methods.

If we're calling object.ToString on a struct, because the struct doesn't
have its own ToString methods, then we don't need a copy because we know
that object.ToString doesn't have side-effects. There is a versioning
issue here if the user later adds a ToString method to that struct, but
the guidance is that we're not worried about this.
@canton7
Copy link
Contributor Author

canton7 commented Apr 18, 2019

@RikkiGibson I've added a commit which changes the behaviour for readonly ToString methods, and structs which don't define their own ToString method. Thanks for your comments! #Resolved

@jcouv jcouv added this to the 16.2 milestone Apr 23, 2019
@RikkiGibson
Copy link
Contributor

RikkiGibson commented Apr 24, 2019

Thanks for making the revisions. Will try and finish reviewing today. #Resolved

Copy link
Member

@gafter gafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am just requesting a couple of small changes:

  1. Restore the test System_String__ConcatObject and verify the behavior when it is run.
  2. Modify the uses of _currentConditionalAccessID so all uses are preincrement (or postincrement, whichever has the least effect on tests).

After that I think this PR is in good shape.

@canton7
Copy link
Contributor Author

canton7 commented May 14, 2019

Thanks for your review! I'll make those changes in the next couple of days.

canton7 added 3 commits May 15, 2019 12:47
Verify that code which would previously have emitted a call to
String.Concat(object) does not emit any diagnostics if this member is
missing, and functions correctly.
@canton7
Copy link
Contributor Author

canton7 commented May 15, 2019

Right, that should be everything addressed. Let me know if you want me to do anything more.

@RikkiGibson RikkiGibson requested a review from gafter May 16, 2019 23:45
@gafter gafter modified the milestones: 16.2, Compiler.Next Jul 9, 2019
Copy link
Member

@gafter gafter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

:shipit:

@gafter gafter closed this Jul 9, 2019
@gafter gafter reopened this Jul 9, 2019
@gafter
Copy link
Member

gafter commented Jul 9, 2019

@canton7 Could you please merge in the latest bits from master to see if that resolves the test failure?

@canton7
Copy link
Contributor Author

canton7 commented Jul 9, 2019

Sure, I'll give that a shot. When I looked into the integration test failure a couple of months ago, it looked like it was a one-off, caused by an access denied error on the build agent - there's a wiki page on how to deal with it, but I obviously don't have permission.

…ncat-order

* upstream/master: (1532 commits)
  Ensure we have stack spilling support for the recently-added expression node `BoundReadOnlySpanFromArray` (dotnet#37057)
  Review feedback
  Avoid generating or emitting NullablePublicOnlyAttribute when no other nullable attributes are emitted (dotnet#37019)
  Respond to more feedback
  Fixes from feedback
  Call `.WithoutNullability()` less
  Fix ngen for assets from nuget
  Make NullableWalker aware of calls to Interlocked.CompareExchange
  fix error
  Update AnonymousObjectCreation implementation to be more robust to errors by using MemberIndexOpt. Address minor PR comments.
  Add support to ngen assets included from nuget package
  Modified node removal to keep original leading trivia
  Fix Solution.WithDocumentFilePath not updating the file path of the tree
  Improve docs.
  PR Feedback cleanup.
  Use pattern-matching in MetadataWriter for readability and possibly performance. (dotnet#36219)
  Renamed helpers in SyntaxFactsService.
  More RefactoringHelpers tests (mostly for extraction).
  Add general tests for RefactoringHelpersService.
  Optimize flow analysis assembly
  ...
@RikkiGibson
Copy link
Contributor

The milestone on this is Compiler.Next. Does that mean we want to delay merging?

@gafter
Copy link
Member

gafter commented Jul 9, 2019

@RikkiGibson I believe we're willing to take this in 16.3 if its ready in time (about 10 days)

@canton7
Copy link
Contributor Author

canton7 commented Jul 10, 2019

@gafter that's passing now. Is there anything else I can do?

Copy link
Member

@agocke agocke left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

var method = boundCall.Method;
if (method.IsStatic && method.ContainingType.SpecialType == SpecialType.System_String)
{
if ((object)method == (object)_compilation.GetSpecialTypeMember(SpecialMember.System_String__ConcatStringString) ||
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In general we prefer to compare symbols using regular == instead of reference equality, even though it doesn't matter in this case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted. I'll have copied that from somewhere else in that file.

@agocke agocke merged commit 78942c7 into dotnet:master Jul 15, 2019
@canton7
Copy link
Contributor Author

canton7 commented Jul 15, 2019

Thanks everyone! 🎉

@gafter
Copy link
Member

gafter commented Jul 15, 2019

@canton7 Thank you for your contribution... and your persistence!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Area-Compilers Community The pull request was submitted by a contributor who is not a Microsoft employee.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants