-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IUtf8SpanFormattable and IUtf8SpanParsable #81500
Comments
Tagging subscribers to this area: @dotnet/area-system-memory Issue DetailsBackground and MotivationWe currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8. With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality. As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Proposed APInamespace System;
public interface IUtf8SpanFormattable : IUtf8Formattable
{
bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}
public interface IUtf8SpanParsable<TSelf> : IUtf8Parsable<TSelf>
where TSelf : ISpanParsable<TSelf>?
{
static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);
static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
} Additional ConsiderationsIt may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed. These APIs operate like There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.
|
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsBackground and MotivationWe currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8. With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality. As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support Proposed APInamespace System;
public interface IUtf8SpanFormattable : IUtf8Formattable
{
bool TryFormat(Span<byte> destination, out int bytesWritten, ReadOnlySpan<byte> format, IFormatProvider? provider);
}
public interface IUtf8SpanParsable<TSelf> : IUtf8Parsable<TSelf>
where TSelf : ISpanParsable<TSelf>?
{
static abstract TSelf Parse(ReadOnlySpan<byte> s, IFormatProvider? provider);
static abstract bool TryParse(ReadOnlySpan<byte> s, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
} Additional ConsiderationsIt may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed. These APIs operate like There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.
|
Hi @tannergooding, would the new Decimal128 type from #81376 be able to support this API? You mentioned that this API proposal doesn't account for number parsing, so I'm assuming that more work is needed for this to be compatible with the new decimal types, right? Is there anything I can do to help with this? |
It specifically doesn't account for the overloads that take |
namespace System;
public interface IUtf8SpanFormattable
{
bool TryFormat(Span<byte> utf8Destination, out int bytesWritten, ReadOnlySpan<char> format, IFormatProvider? provider);
}
public interface IUtf8SpanParsable<TSelf>
where TSelf : IUtf8SpanParsable<TSelf>?
{
static abstract TSelf Parse(ReadOnlySpan<byte> utf8, IFormatProvider? provider);
static abstract bool TryParse(ReadOnlySpan<byte> utf8, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
} namespace System.Numerics;
public interface INumberBase<TSelf>
{
static virtual TSelf Parse(ReadOnlySpan<byte> utf8Text, NumberStyles style, IFormatProvider? provider);
static virtual bool TryParse(ReadOnlySpan<byte> utf8Text, NumberStyles style, IFormatProvider? provider, [MaybeNullWhen(false)] out TSelf result);
} |
Is the parameter name here meant to just be "utf8" or shouldn't it be "utf8Text" like in public interface IUtf8SpanParsable<TSelf>
where TSelf : IUtf8SpanParsable<TSelf>?
{
static abstract TSelf Parse(ReadOnlySpan<byte> utf8, IFormatProvider? provider);
static abstract bool TryParse(ReadOnlySpan<byte> utf8, IFormatProvider? provider, [MaybeNullWhen(returnValue: false)] out TSelf result);
} |
Does this make Utf8Formatter and Utf8Parser obsolete? |
I expect there will be little need for Utf8Formatter. Utf8Parser diverged from the standard number parsing behavior. When it encounters something that's not part of the number and stops parsing, it returns what it has so far rather than failing. Analogous to StartsWith rather than Equals. That behavior is sometimes what you want, so it still has use, and I expect at some point we'll want to actually add the char equivalent, though I think it more likely we'd do so via NumberStyles so that it's integrated with generic math... at which point Utf8Parser would also no longer have much value. |
Suggestion: include the System.Numerics vector, matrix, and quaternion types as well. From a gamedev perspective it would be very nice to format these types directly to a Utf8 buffer without allocating. |
We'll plan on expanding the list of types as necessary. We're just looking at covering the most core types in the first pass. Please feel free to open API proposals for other types as appropriate. |
This issue covers adding UTF8 to things already implementing ISpanFormattable. Please open separate issues for other types. #83201 exists for PhysicalAddress. |
This came up in review to discuss whether the types implementing this interface should implement the methods implicitly (public on the type) or explicitly (requiring the interface cast/coercion). The answer was "match what we did for ISpanParsable/ISpanFormattable", which seems to be implicit everywhere except System.Char (explicit there). |
We landed There are some types that didn't get support for both interfaces which we'll hopefully land early in .NET 9 |
Pending work here IUtf8SpanFormattable implementations:
IUtf8SpanParsable implementations:
|
Background and Motivation
We currently support UTF-16 based formatting and parsing and even expose common interfaces through which any developer can declare their types as supporting the same. However, we have no such support for the same around UTF-8.
With UTF-8 being ever more prevalent for various scenarios, it would be ideal if similar interfaces could be exposed so users can express that their own types support the functionality.
As such, I propose we expose two new interfaces that support parsing/formatting types using UTF-8. These interfaces would only support
Span
today and as we do not have a correspondingUtf8String
type that would make exposingIUtf8Formattable
orIUtf8Parsable
viable today. We could express those asbyte[]
, but that is "less ideal" and blocks us from supporting any future utf8 string type.Proposed API
Initial types that will implement the interface
System.Enum
,System.Rune
, andSystem.Version
all implementISpanFormattable
today. They could optionally implementIUtf8SpanFormattable
as well.We should ideally have
System.Numerics.INumberBase<TSelf>
implement bothIUtf8SpanFormattable
andIUtf8SpanParsable<TSelf>
. Doing this would require a DIM that defers to the UTF-16 variant.Additional Considerations
It may be desirable to provide some API that lets users know the longest potential format string so they can have a "fail safe" way of formatting their value. For many types this is a well-defined upper bound or can be trivially computed.
These APIs operate like
ISpanFormattable
andISpanParsable
and not likeUtf8Formatter
orUtf8Parser
. That is, they fail if they encounter unrecognized or unsupported data where-as the latter instead treat it as effectively "end of data to parse". There are both pros and cons to this approach, but I believe that the latter's functionality is better expressed via a different API and one that could also apply to UTF-16.This doesn't account for
number parsing
which would likely entail extendingINumberBase<TSelf>
with new UTF-8 APIs as well. If we expose such APIs, we'd also extendINumberBase<TSelf
with the following methods (which would be DIM and defer to the UTF-16 variants):Should we take
ReadOnlySpan<byte> format
orstring format
. There are pros/cons to each approach.The text was updated successfully, but these errors were encountered: