[Proposal]: UTF8 literal support for nameof() expressions #6235

Sergio0694 · 2022-06-24T14:06:03Z

Note: extracted minimal proposal from #184 (comment).

UTF8 literal support for nameof() expressions

Proposed
Prototype: Not Started
Implementation: Not Started
Specification: Not Started

Summary and motivation

This proposal is a small follow up to the UTF8 literal feature. There are many scenarios where developers would want to move to passing UTF8 literals to get better performance and not having to transcode the text at runtime, but the way the feature is currently specced, it would mean that replacing all callsites where today developers are using nameof, [CallerMemberName], [CallerArgumentExpression] etc. would not be possible. This would mean that developers would be forced to choose between less error prone code with worse performance (what they have today), or more error prone code (due to hardcoded literals) with faster performance. Having to make a compromise here is not ideal, and this proposal is meant to offer minimal support to solve this issue.

Specifically, the proposal is only to enable nameof expressions to be used as UTF8 arguments. This would allow developers to get robust code by using the nameof operator, and callsites currently using [CallerMemberName] and [CallerArgumentExpression] could just move to nameof instead to still have build time checks for the literals, instead of having hardcoded strings.

Essentially, the proposal is to allow u8 to be appended to a nameof expression, like so:

void Test(int parameter)
{
    var testUtf8 = nameof(Test)u8;
    var parameterUtf8 = nameof(parameter)u8;
}

This would have the same flexibility as nameof today, but with UTF8 support. Roslyn would just lower this exactly the same as it would have had if a literal had been used in place of that nameof expression, with the u8 suffix at the end.

Use case example

As mentioned in #184 (comment), we have a custom, high-performance, reflection-free ETW logger in the Microsoft Store, which exposes a builder pattern to allow callers to construct their structured events to submit, like so (simplified):

public void SomeEvent(string someText, string someOtherText, ...)
{
    using TracingDataBuilder builder = TracingDataBuilder.Create();
 
    // ...
   
    builder.AppendEventTagAndName(SOME_TAG.BAR); // [CallerMemberName]
    builder.AppendWStringKeyValuePair(pSomeText, someTextLength, nameof(someText));
    builder.AppendWStringKeyValuePair(pSomeOtherText, someOtherTextName, nameof(someOtherText));
    builder.AppendWStringKeyValuePair(pSomeId, someIdLength, nameof(someId));
    builder.AppendWStringKeyValuePair(pSomeType, someTypeLength, nameof(someType));
    builder.AppendBoolKeyValuePair(&someBoolParameter, nameof(someBoolParameter));
    builder.AppendBoolKeyValuePair(&someOtherBoolParameter, nameof(someOtherBoolParameter));
    builder.AppendInt32KeyValuePair(&someIntParameter, nameof(someIntParameter));
    
    _ = builder.EventWriteTransfer(_traceLogger, descriptor, null, null);
}

This currently uses Encoding.UTF8 to get the UTF8 data for those event and field names and build the event metadata. This is not ideal for performance, and we'd like to migrate to just using UTF8 literals. The issue with that is that due to the current design, there would be no way to use them without having to fall back to having hardcoded strings everywhere, which is not great.

With the proposed feature instead, we could just drop the attributes and simply do the following:

public void SomeEvent(string someText, string someOtherText, ...)
{
    using TracingDataBuilder builder = TracingDataBuilder.Create();
 
    // ...
   
    builder.AppendEventTagAndName(SOME_TAG.BAR, nameof(SomeEvent)u8);
    builder.AppendWStringKeyValuePair(pSomeText, someTextLength, nameof(someText)u8);
    builder.AppendWStringKeyValuePair(pSomeOtherText, someOtherTextName, nameof(someOtherText)u8);
    builder.AppendWStringKeyValuePair(pSomeId, someIdLength, nameof(someId)u8);
    builder.AppendWStringKeyValuePair(pSomeType, someTypeLength, nameof(someType)u8);
    builder.AppendBoolKeyValuePair(&someBoolParameter, nameof(someBoolParameter)u8);
    builder.AppendBoolKeyValuePair(&someOtherBoolParameter, nameof(someOtherBoolParameter)u8);
    builder.AppendInt32KeyValuePair(&someIntParameter, nameof(someIntParameter)u8);
    
    _ = builder.EventWriteTransfer(_traceLogger, descriptor, null, null);
}

The .NET runtime also has some examples of where this could be useful. For instance, consider this: https://github.com/dotnet/runtime/blob/2a01ceb2d004a125757a4bb95a9341cc283c5afd/src/libraries/System.Reflection.MetadataLoadContext/src/System/Reflection/TypeLoading/General/Utf8Constants.cs#L41-L62. This is exactly the kind of scenario where developers used to hardcode literals before we had the nameof operator, and then switched to Foo => nameof(Foo) when that became available. Having to go back to literals once again just to be able to get UTF8 encoding is not great. This could just be written as Foo => nameof(Foo)u8 instead to avoid the use of hardcoded string literals whenever possible.

Detailed design

The design allows using the u8 suffix on a nameof expression to have it lowered to an UTF8 ReadOnlySpan<byte>. All the same other rules as normal UTF8 literals (see here) apply here. A nameof expression with the u8 suffix would be lowered exactly the same as a literal with the same content of the nameof expression, with the u8 suffix.

The same conversion rules and other general rules around UTF8 literals would apply here as well, with no changes.

Drawbacks

It's a new, although small, bit of syntax, and it would likely need some mapping in the public code analysis APIs as well.

Alternatives

Force developers to go back to hardcoded strings in all of these scenarios. This works, but it's far from ideal.

The text was updated successfully, but these errors were encountered:

teo-tsirpanis · 2022-06-24T14:10:12Z

The u8 suffix after a call-like expression does not look nice to me. I would prefer either nameof8(x), or making the suffix optional if it is unambiguous that we want the name in UTF-8.

Sergio0694 · 2022-06-24T14:20:55Z

The proposal uses u8 as a suffix mostly for consistency, if we consider a nameof expression to be conceptually the same as a string literal. I would personally be fine with omitting the suffix and just allowing a nameof expression as a ReadOnlySpan<byte> parameter, but if I remember correctly, the LDM was in general against having implicit UTF8 conversions, which is why u8 was also made mandatory, so I don't see that happening (and I agree u8 should be explicit).

Integrating some comments from Discord by @sylveon:

maybe be C++ bias but i think nameof(Test)u8 makes sense
since u8 is a s suffix to the string
and if the mental model is nameof(T) -> "T", then appending u8 should give "T"u8

That is exactly what I had in mind too and why it looks consistent to me 🙂

Another issue with eg. nameof8 (or the nameofu8() proposed by @EgorBo): that's effectively a new keyword, so it's automatically -100000 starting points and basically a complete non-starter. The proposed syntax just has the same existing u8 suffix, but after a nameof expression. The nameof operator specifically is equivalent to a literal in that it returns a string constant, so it'd keep the same current "string literal + u8" structure that normal UTF8 string literals have.

HaloFour · 2022-06-24T14:42:54Z

Feels like a lot of this could be handled through some target typing?

void Foo([CallerMemberName] ReadOnlySpan<byte> memberName = default) {
    ReadOnlySpan<byte> paramName = nameof(memberName);
}

Sergio0694 · 2022-06-24T14:58:01Z

This was something I thought about too, but after talking with @333fred and @jaredpar, they mentioned that those attributes are just not specced to support UTF8 spans, so enabling that would require more complex changes that they likely wouldn't consider to be worth it. Additionally, there's the point about the LDM deciding to always use u8 explicitly, which would make this a somewhat weird scenario (there's an implicit literal passed here but the fact the resulting data is UTF8 is not specified anywhere). That's where the current proposal comes from: it's simply an extension of the existing nameof expression, it uses the same "literal + u8" explicit suffix structure to be consistent, and it would allow supporting all of these scenarios with minimal changes. Sure, you wouldn't have the implicit caller attributes, but at least there would be no need to use hardcoded literals anymore in any of these scenarios, and by only making a change to the nameof syntax, and nothing else.

dotnet locked and limited conversation to collaborators Jun 24, 2022

333fred converted this issue into discussion #6238 Jun 24, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

This issue was moved to a discussion.

[Proposal]: UTF8 literal support for nameof() expressions #6235

[Proposal]: UTF8 literal support for nameof() expressions #6235

Sergio0694 commented Jun 24, 2022 •

edited

Loading

teo-tsirpanis commented Jun 24, 2022

Sergio0694 commented Jun 24, 2022 •

edited

Loading

HaloFour commented Jun 24, 2022

Sergio0694 commented Jun 24, 2022

This issue was moved to a discussion.

This issue was moved to a discussion.

[Proposal]: UTF8 literal support for nameof() expressions #6235

[Proposal]: UTF8 literal support for nameof() expressions #6235

Comments

Sergio0694 commented Jun 24, 2022 • edited Loading

UTF8 literal support for nameof() expressions

Summary and motivation

Use case example

Detailed design

Drawbacks

Alternatives

teo-tsirpanis commented Jun 24, 2022

Sergio0694 commented Jun 24, 2022 • edited Loading

HaloFour commented Jun 24, 2022

Sergio0694 commented Jun 24, 2022

This issue was moved to a discussion.

Sergio0694 commented Jun 24, 2022 •

edited

Loading

Sergio0694 commented Jun 24, 2022 •

edited

Loading