-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Avoid boxing in string concatenation #35006
Conversation
3f233de
to
f00e6b7
Compare
@@ -3492,6 +3492,7 @@ void M(string[] args) | |||
{ | |||
// Code size 7 (0x7) | |||
.maxstack 1 | |||
.locals init (int V_0) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm assuming these are harmless? They're present in most of the other tests in this file - I'm not sure why my change triggered this. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect what happened here is that you transformed x + y + z
into something with a temporary variable, which then got dropped (i.e. left unused) in a subsequent optimization/transformation. It is a bit worrying, but not incorrect.
In reply to: 275783989 [](ancestors = 275783989)
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs
Outdated
Show resolved
Hide resolved
f00e6b7
to
b00bd1f
Compare
@@ -2260,33 +2252,6 @@ struct X | |||
); | |||
} | |||
|
|||
[Fact] | |||
public void System_String__ConcatObject() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We genuinely never emit this any more, so we can't trigger this diagnostic.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe please add the test back with whatever actual behavior it displays (no errors) verified, and also check that it does the right thing at runtime.
In reply to: 275796563 [](ancestors = 275796563)
f06b1b8
to
1c08ab7
Compare
This changes the order of evaluation for string concatenation to follow the C# specification. Previously, when the compiler lowered a string concatenation containing any objects which were not strings (or implicitly convertible to strings), it would generate a call to an appropriate overload of `string.Concat` taking `objects`. This meant that all expressions being concatenated were evaluated, then `string.Concat` called, which called `.ToString()` on all of them. E.g. "x" + y + z got compiled to : string.Concat("x", y, z) Here, `y` and `z` are evaluated to `object` (boxing them if necessary), and then they have `.ToString()` called on them by `string.Concat`. This is technically against the C# specification, which defines string concatenation as a sequence of left-associative calls to one of: string operator +(string x, string y); string operator +(string x, object y); string operator +(object x, string y); So the above expression should conceptually be compiled to: op_Addition(op_Addition("x", y), z) That is, `y` is evaluated to `object` and concatenated with `"x"` before `z` is evaluated to `object`. The decision was made to change this behaviour in dotnet#522. This change brings a very important advantage: we can now fully evaluate `y` to a string before we evaluate `z`, which means we're allowed to compile the above to: string.Concat("x", y.ToString(), z.ToString()) This means that: - We don't box `y` and `z` if they're value types - We don't call `ToString()` on `"x"` (which is harmless but pointless) This commit implements this change. Calls are no longer emitted to overloads of `string.Concat` which take `object` - instead, all parameters are evaluated to a string, and then an appropriate overload of `string.Concat` which takes `strings` is called. Care is taken to deal with different types of expressions in different ways: - Strings or types which are implicitly convertible to string never have .ToString called on them - Reference types (and unconstrained generics) have ?.ToString() called on them - A copy is made of non-readonly non-constant value types (and unconstrained generics), before.ToString() is called on the copy. This is to preserve the previous behaviour where value types were boxed, and so any side-effects of ToString() weren't reflected in the original. - A copy isn't made of readonly value types, as we know that ToString() can't have side-effects. We emit `call` to special struct types, as we know these have their own ToString implementation which won't be removed. For other value types, we emit a constrained virtual call. This follows the precedent set in For unary concatenation (e.g. `thing + ""`) we now emit `thing?.ToString() ?? ""` instead of `string.Concat(thing) ?? ""`. This does increase the size of generated code. My impression is that the unit tests which were affected by this changed increased in size by about 10%. This is somewhat offset by the fact that we can now call string.Concat(string, string, string, string) instead of string.Concat(object[]) (since there is no overload of string.Concat which takes 4 objects), which reduces the size of some methods significantly. Fixes dotnet#522
This excludes `string.Concat(object, object)` as that's emitted inside expressions. After the previous commit, these weren't being used. This commit does not remove these special members entirely, although they're currently not referenced from anywhere else. This change means that we will no longer recognise and unpack a call to an overload of `string.Concat` which takes `objects`. Previously if a user compiled: string.Concat("x", y) + z (where `y` and `z` are not convertible to string) we would compile it to: string.Concat("x", y, z) This will no longer happen, instead we will output: string.Concat(string.Concat("x", y), z) This is probably for the best. We've made a decision to change the semantics of string concatenation to be in line with the specification, and rewriting this case would go against the spec.
1c08ab7
to
089d0d1
Compare
@canton7 I don't think the description of this PR is correct. That looks like the description of your previous PR. #Resolved |
@gafter My previous PR was about This one changes the semantics of string concatenation so that each parameter is evaluated to an object and then converted to a string in turn, rather than all of them being evaluated to objects, then all converted to strings. We do this by calling This PR undoes a lot of the complexity from my previous PR - I wasn't aware of #522 when I wrote that! If you think the title is misleading, I won't complain if you change it 😄 #Resolved |
@canton7 No, you're right, the description does describe what this PR does. I didn't look closely enough. #Resolved |
} | ||
|
||
int currentConditionalAccessID = ++_currentConditionalAccessID; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I saw one place in code where this was ++_currentConditionalAccessID
, and another where it was _currentConditionalAccessID++
. I'm assuming that the second is buggy (although I tried and failed to trigger it), and copied the pattern from the first.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Anyone able to comment about why the increments vary in this way in different locations?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@VSadov? You wrote the original _currentConditionalAccessId++
. #Resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but can you please change it so that they are all preincrement?
In reply to: 278340949 [](ancestors = 278340949)
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs
Outdated
Show resolved
Hide resolved
….ToString From review comments. Readonly methods have landed, so we don't need to create a copy of a struct when calling a readonly ToString methods. If we're calling object.ToString on a struct, because the struct doesn't have its own ToString methods, then we don't need a copy because we know that object.ToString doesn't have side-effects. There is a versioning issue here if the user later adds a ToString method to that struct, but the guidance is that we're not worried about this.
@RikkiGibson I've added a commit which changes the behaviour for readonly ToString methods, and structs which don't define their own ToString method. Thanks for your comments! #Resolved |
src/Compilers/CSharp/Portable/Lowering/LocalRewriter/LocalRewriter_StringConcat.cs
Outdated
Show resolved
Hide resolved
Thanks for making the revisions. Will try and finish reviewing today. #Resolved |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am just requesting a couple of small changes:
- Restore the test
System_String__ConcatObject
and verify the behavior when it is run. - Modify the uses of
_currentConditionalAccessID
so all uses are preincrement (or postincrement, whichever has the least effect on tests).
After that I think this PR is in good shape.
Thanks for your review! I'll make those changes in the next couple of days. |
Verify that code which would previously have emitted a call to String.Concat(object) does not emit any diagnostics if this member is missing, and functions correctly.
Right, that should be everything addressed. Let me know if you want me to do anything more. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@canton7 Could you please merge in the latest bits from master to see if that resolves the test failure? |
Sure, I'll give that a shot. When I looked into the integration test failure a couple of months ago, it looked like it was a one-off, caused by an access denied error on the build agent - there's a wiki page on how to deal with it, but I obviously don't have permission. |
…ncat-order * upstream/master: (1532 commits) Ensure we have stack spilling support for the recently-added expression node `BoundReadOnlySpanFromArray` (dotnet#37057) Review feedback Avoid generating or emitting NullablePublicOnlyAttribute when no other nullable attributes are emitted (dotnet#37019) Respond to more feedback Fixes from feedback Call `.WithoutNullability()` less Fix ngen for assets from nuget Make NullableWalker aware of calls to Interlocked.CompareExchange fix error Update AnonymousObjectCreation implementation to be more robust to errors by using MemberIndexOpt. Address minor PR comments. Add support to ngen assets included from nuget package Modified node removal to keep original leading trivia Fix Solution.WithDocumentFilePath not updating the file path of the tree Improve docs. PR Feedback cleanup. Use pattern-matching in MetadataWriter for readability and possibly performance. (dotnet#36219) Renamed helpers in SyntaxFactsService. More RefactoringHelpers tests (mostly for extraction). Add general tests for RefactoringHelpersService. Optimize flow analysis assembly ...
The milestone on this is Compiler.Next. Does that mean we want to delay merging? |
@RikkiGibson I believe we're willing to take this in 16.3 if its ready in time (about 10 days) |
@gafter that's passing now. Is there anything else I can do? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
var method = boundCall.Method; | ||
if (method.IsStatic && method.ContainingType.SpecialType == SpecialType.System_String) | ||
{ | ||
if ((object)method == (object)_compilation.GetSpecialTypeMember(SpecialMember.System_String__ConcatStringString) || |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In general we prefer to compare symbols using regular ==
instead of reference equality, even though it doesn't matter in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Noted. I'll have copied that from somewhere else in that file.
Thanks everyone! 🎉 |
@canton7 Thank you for your contribution... and your persistence! |
This changes the order of evaluation for string concatenation to follow
the C# specification. See #522.
Previously, when the compiler lowered a string concatenation containing
any objects which were not strings (or implicitly convertible to
strings), it would generate a call to an appropriate overload of
string.Concat
takingobjects
. This meant that all expressions beingconcatenated were evaluated, then
string.Concat
called, which called.ToString()
on all of them.E.g.
got compiled to :
Here,
y
andz
are evaluated toobject
(boxing them if necessary),and then they have
.ToString()
called on them bystring.Concat
.This is technically against the C# specification, which defines string
concatenation as a sequence of left-associative calls to one of:
So the above expression should conceptually be compiled to:
That is,
y
is evaluated toobject
and concatenated with"x"
before
z
is evaluated toobject
.The decision was made to change this behaviour in #522.
This change brings a very important advantage: we can now fully evaluate
y
to a string before we evaluatez
, which means we're allowed tocompile the above to:
This means that:
y
andz
if they're value typesToString()
on"x"
(which is harmless but pointless)This commit implements this change.
Calls are no longer emitted to overloads of
string.Concat
which takeobject
- instead, all parameters are evaluated to a string, and thenan appropriate overload of
string.Concat
which takesstrings
iscalled.
Care is taken to deal with different types of expressions in different
ways:
.ToString called on them
on them
unconstrained generics), before.ToString() is called on the copy. This
is to preserve the previous behaviour where value types were boxed, and
so any side-effects of ToString() weren't reflected in the original.
can't have side-effects.
We emit
call
to special struct types, as we know these have their ownToString implementation which won't be removed. For other value types,
we emit a constrained virtual call. This follows the precedent set in
For unary concatenation (e.g.
thing + ""
) we now emitthing?.ToString() ?? ""
instead ofstring.Concat(thing) ?? ""
.This does increase the size of generated code. My impression is that the
unit tests which were affected by this changed increased in size by
about 10%. This is somewhat offset by the fact that we can now call
instead of
(since there is no overload of string.Concat which takes 4 objects),
which reduces the size of some methods significantly.
I haven't removed the SpecialMember versions of
string.Concat
which take object - let me know if you want me to.