-
Notifications
You must be signed in to change notification settings - Fork 4.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[suggestion] Compiler should call ToString on non-string types used with string.operator+ #10966
Comments
Here is a similar issue about avoiding allocations for I think the solution proposed there, adding generic overloads, would help here too, even if only for cases where the number of concatenated objects is small. It could potentially be even more efficient, for example it could convert some common types (like |
Be careful here with operators: class Test
{
public static implicit operator string(Test t)
{
return "op";
}
public override string ToString()
{
return "to";
}
}
static void Main(string[] args)
{
Test someClass = null;
string a = "baz" + someClass; // what was written
string b = string.Concat("baz", someClass?.ToString()); // what was emitted
someClass = new Test();
string c = "baz" + someClass; // what was written
string d = string.Concat("baz", someClass?.ToString()); // what was emitted
}
|
Unfortunately, that optimization can only applied to certain types (specifically these) since the string representation of numbers is dependent on some things at runtime (e.g.
I understand where you're coming from, but why should we need generic overloads in this case? If something like this: int age = 17;
string message = "You are " + age + " years old"; was transformed to this: int age = 17;
string message = string.Concat("You are ", age.ToString(), " years old"); then we wouldn't even need to avoid boxing; it would call the |
@miloush Good point; isn't that true for the current way it's handled in the compiler as well, though? I think detection of operator overloading comes before the |
|
@jamesqo I didn't mean that Roslyn itself could convert the So, your solution saves boxing the But as long as that doesn't exist, calling |
Ah, I see what you mean now. That would probably be tricky with how it's currently implemented, though; all of the |
@jamesqo once we can pattern-match on type a lot of these |
@AdamSpeight2008 Yeah, but the point still stands as it'd be very tricky to get the theoretical length of what template <typename... Args>
void myfun(Args... params) { } then yeah, maybe it might be worth it. For now, I think this is the best we can do though since it's a pure win over how it's currently implemented. |
We could get the maximum ApproxMaxCharLength<T>( obj : T ) : Int
{
return match( typeof( T ) ) with
{
| string -> ((string)obj).Length;
| bool -> 5; // false
| byte -> 3; // 256
| sbyte -> 4; // -128
| char -> 1; //
| uint16 -> 6; // 65535
| int16 -> 6; // -32768
| uint32 -> 11; // -2147483648
| int32 -> 10; // 4294967295
| uint64 -> 20; // 18446744073709551615
| int64 -> 20; // –9223372036854775808
| ... -> 0;
// not done the floating point ones.
}
} An initial first approximation of the string length, could be calculated thus. var intialSize = args.Sum( ApproxMaxCharLength ) + TextOfFormatString.Length; I think this would be an over allocation. Note this is exclude any variation caused my the culture and additional alignment and format specifiers on the arg hole eg At which point (don't done any deep analysis) would the extra processing to get a good estimate of the size of string to allocation, outweighs the processing cost of just blindly done the building of the string. |
@AdamSpeight2008 I wonder how useful would approximating the length be. If you can figure out all the lengths accurately, you only need one allocation. If you approximate them, you need two: the initial one based on the approximation, and then the final one of the right length, where you copy the characters from the initial one. Getting accurate lengths would be relatively expensive (you basically run the same algorithm as On the other hand that temporary string could be allocated on the stack (as long as it's relatively short), which could make the approximate approach worth it again. |
@svick |
@svick
Altering the algorithm to pick the larger of the
https://gist.github.com/AdamSpeight2008/ef6834483148afcac59234eef36ef36f |
@jaredpar I don't know why you labeled this Language Design. It appears to be a suggestion for improved IL generation in the compiler (i.e. an "optimization"). |
@gafter added language design because the spec is specific about how the conversion should work here:
My original reading made me worry this is potential violating the spec because moving from a virtual call to non-virtual (struct case). Thinking through it though I don't think it's actually a problem. But there are two other problems that need to be addressed. MutationsThe optimization cannot be to Side EffectsConsider this code: "hello" + e1 + (new Widget()) + "world"
// translates to
string.Concat("hello", e1, e2, "world") This cannot be optimized to the following: string.Concat("hello", e1.ToString(), (new Widget()).ToString(), "world") Imagine the case where In order to do this optimization side effects need to be taken into account. It has roughly the problem space as |
Background
There's a lot of code out there that uses string concatenation to append arbitrary objects to strings, like so:
At compile-time, Roslyn tranforms calls to
string.operator+
tostring.Concat
to avoid allocating intermediary strings, so what we end up with is this:Unfortunately,
string.Concat
only acceptsobject
parameters, meaning thatAge
(which is an int) is implicitly boxed when it is passed into the function. Of course, boxing is no good for performance, so if the developer wants to avoid this, he/she has to explicitly callToString
on the variable, e.g.and the compiler will do the right thing and omit the
box
instruction.An example of where this really became a pain was dotnet/corefx#8025, where a bunch of
ToString
calls had to be added to avoid boxing on each item.Proposal
Instead of calling the
string.Concat
overloads that accept objects, the compiler should exclusively generate code targeting the ones accepting a string. For value types, the code generated would coerce the variable into a string viaToString
:Existing strings could simply be concatenated directly, without a
ToString
call (same behavior as today):And for other reference types, a
ToString
would be emitted with an extra null check:The text was updated successfully, but these errors were encountered: