- "Doesn't everyone love the native compiler?"
https://github.com/dotnet/csharplang/blob/main/proposals/utf8-string-literals.md#unresolved-questions
#184
Today, we looked to answer some initial questions on the UTF-8 prototype implementation. It's important to note that this prototype is intended to be handed off to partners to play with so we can get some feedback on what works and what doesn't: the rules we're deciding today are not intended to be the final LDM rulings on the questions. We can and should revisit some or all of these questions after initial impressions are gathered from users.
We do have precedent in the language for conversions that depend on the constant value being converted: for example, integer constants can be converted
to bytes if the constant value would fit in a byte. However, we don't think that it's necessary to special case here. We already allow (string)null
to
be converted to Span<char>
via a user-defined conversion, so we don't think there's any new ground being tread here. It does marginally widen the
breaking change surface, but we think that's ok for the prototype.
The conversion will apply to null string constants.
After examining the existing types of implicit conversions, we think conversion is best specified as a new kind of conversion. It doesn't fit well into any existing conversion: constant conversions, for example, have both constant input and output values. This conversion won't have a constant output value.
We will introduce a new conversion kind for string constant to UTF-8 bytes.
The next question after deciding on a separate kind is whether this conversion will be a standard conversion or not. This impacts whether it can be used as an input or output conversion for user-defined conversions, and is also where history complicates our decision a bit. The native C# compiler (used for C# 5 and earlier) implemented the spec incorrectly and considered more conversions to be standard than actually are, leading to an experience that is potentially inconsistent, no matter what we do. For now, though, our initial gut reaction is that there's too much involved in one of these conversions to be considered in the list of standard conversions. We will revisit this question after prototype feedback.
Not a standard conversion, for now.
In theory, this conversion could be represented in an expression tree just as the final bytes wrapping an array creation. However, the conversion itself has semantic meaning, and we think that it's important for expression trees to show this meaning. Therefore, we will block this conversion in expression trees, and future modernization efforts will consider this case well.
Blocked.
Since this is a prototype, we think that the proposed byte[]
is fine for now. We will likely want to have a deeper debate about whether string literals
should have a natural type of a mutable array, but we don't think that debate is necessary for now.
The natural type of u8
literals will be byte[]
, for the prototype.
There is some interplay here with the standard conversion rules: if the conversion is a standard conversion, then any place that takes a byte[]
as a
user defined conversion would be able to take a string literal. That leaves things like IEnumerable<byte>
, which standard conversions have no impact
on.
There's a slippery slope here. If we don't make the conversion a standard conversion, then there's a potentially-infinite number of places to make the
conversion work. We'll err on the side of getting feedback for now: the current set of conversion targets (byte[]
, Span<byte>
, and ReadOnlySpan<byte>
),
and we'll see what users think after using the prototype.
No new conversion targets added for now.
This one is definitely the trickiest of the issues we're discussing. We just made a big breaking change in C# with lambda natural types. This was
successful, but it took a lot of users using the feature and telling us what was broken for us to hammer down the rules. The better function member
rule as proposed will address some of the breaking changes, but not all: new instance methods could be called instead of extension methods, for example.
Additionally, the justification we used for such changes with lambdas does not apply here. The justification for lambdas was that any extension methods
almost certainly delegated to the original method, just providing a target type. This reasoning almost certainly does not hold up for byte[]
vs string
.
We think the best way to gather feedback is again to be bold and have the prototype make no adjustments at all. This will almost certainly break some
people, but based on the lambda feature we have real evidence that people use our previews and report back to us when things break. This feedback will be
used to help determine how agressive we need to be around breaking changes, or whether we need to abandon the conversion form entirely and only do a u8
syntax form.
The prototype will not adjust any rules here, so we can hopefully see what breaks in practice.
We should be consistent with other constant suffixes.
Consistency means that u8
or U8
will be allowed.