-
Notifications
You must be signed in to change notification settings - Fork 4.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[API Proposal]: ISpanScannable, IUtf8SpanScannable #93339
Comments
public class ParsableVsScannable
{
public ParsableVsScannable()
{
if (Parsable() != Scannable())
throw new Exception();
}
readonly static byte[] utf8Text =
Encoding.UTF8.GetBytes(string.Join(" ", Enumerable.Range(100000, 100000).Select(x => x.ToString())) + " ");
[Benchmark]
public int Parsable()
{
var span = utf8Text.AsSpan();
var xor = 0;
do
{
var delim = span.IndexOf((byte)' ');
if (int.TryParse(span.Slice(0, delim), CultureInfo.InvariantCulture, out var x))
xor ^= x;
span = span.Slice(delim + 1);
} while (!span.IsEmpty);
return xor;
}
[Benchmark]
public int Scannable()
{
var span = utf8Text.AsSpan();
var xor = 0;
do
{
if (Utf8Parser.TryParse(span, out int x, out var bytesConsumed))
{
xor ^= x;
span = span.Slice(bytesConsumed + 1);
}
} while (!span.IsEmpty);
return xor;
}
}
|
Tagging subscribers to this area: @dotnet/area-system-runtime Issue DetailsBackground and motivationThis is a successor of Currently, Currently, this can be solved with a bunch of static These approaches have some disadvantages:
.NET already has an If adding a new value to this enum is a breaking change, the new method may return
API Proposalnamespace System.Buffers
{
public interface ISpanScannable<TSelf>
where TSelf : ISpanScannable<TSelf>?
{
static abstract OperationStatus TryParse(
ReadOnlySpan<char> text,
IFormatProvider? provider,
out int bytesConsumed,
out TSelf result);
}
public interface IUtf8SpanScannable<TSelf>
where TSelf : IUtf8SpanScannable<TSelf>?
{
static abstract OperationStatus TryParse(
ReadOnlySpan<byte> utf8Text,
IFormatProvider? provider,
out int bytesConsumed,
out TSelf result);
}
}
namespace System.Buffers
{
public enum OperationStatus
{
...,
/// <summary>
/// The entire input is processed, up to the end of the buffer.
/// However, caller can stitch the remaining unprocessed input with more data, retry, and get different result.
/// </summary>
PartiallyDone
}
}
API Usage
class Utf8Reader(Stream stream)
{
byte[] buffer = new byte[16];
int position = 0;
int end;
public TValue Read<TValue>() where TValue : IUtf8SpanParsable<TValue>
{
TValue value;
SkipDelimiters();
while (true)
{
OperationStatus status = TValue.TryParse(buffer.AsSpan(position), out int bytesConsumed, out value);
switch (status)
{
case OperationStatus.Done:
position += bytesConsumed;
return value;
case OperationStatus.PartiallyDone:
if (FetchMoreData())
continue;
position += bytesConsumed;
return value;
case OperationStatus.NeedMoreData:
if (FetchMoreData())
continue;
throw new FormatException();
case OperationStatus.InvalidData:
throw new FormatException();
}
}
}
bool FetchMoreData()
{
var unconsumedLength = end - position;
buffer.AsSpan(position, end).CopyTo(buffer);
position = unconsumedLength;
if (position == buffer.Length)
GrowBuffer();
var bytesRead = stream.Read(buffer.AsSpan(position));
if (bytesRead <= 0)
return false;
position += bytesRead;
return true;
}
void SkipDelimiters() { ... }
void GrowBuffer() { ... }
} Alternative DesignsWithout namespace System
{
public interface ISpanScannable<TSelf>
where TSelf : ISpanScannable<TSelf>?
{
static abstract bool TryParse(
ReadOnlySpan<char> text,
IFormatProvider? provider,
out int bytesConsumed,
out TSelf result);
}
public interface IUtf8SpanScannable<TSelf>
where TSelf : IUtf8SpanScannable<TSelf>?
{
static abstract bool TryParse(
ReadOnlySpan<byte> utf8Text,
IFormatProvider? provider,
out int bytesConsumed,
out TSelf result);
}
} Risks
sealed class MaybeNullWhenAttribute<TEnum>(params TEnum[] returnValues) : Attribute where TEnum : Enum
{
public TEnum[] ReturnValues { get; } = returnValues;
}
|
Would appreciate some additional input from @stephentoub and @GrabYourPitchforks here. |
Background and motivation
This is a successor of
IUtf8SpanParsable<TSelf>
API proposal where this scenario is already concerned, but deferred to a later. (IUtf8SpanParsable
was designed to be in sync with existingISpanParsable
). It proposes not a replacement forIUtf8SpanParsable<TSelf>
, but an alternative for advanced scenarios.Currently,
ISpanParsable<TSelf>
,IUtf8SpanParsable<TSelf>
interfaces can't be used to parse a value when the size of serialized data is unknown. When a caller wants to deserialize text from some buffer, it must scan the buffer up to the delimiter, and only then pass a piece of data to theParse/TryParse
methods. Thus, each byte/character will be processed twice: the first time to find the size and the second time to parse a value.Currently, this can be solved with a bunch of static
TryParse
methods in theUtf8Parser
class. And there is another API proposal to add an option to stop parsing after the first invalid character.These approaches have some disadvantages:
Utf8Parser
static methods and theNumberStyles.AllowTrailingInvalidCharacters
option are limited to BCL types.NumberStyles
also does not cover non-numeric types such asTimeSpan
,DateTime
,DateTimeOffset
,DateOnly
,TimeOnly
, ...bool
returning, which may leads to some quirks when parsing, e.g. floating-point values.Utf8Parser.TryParse("1."u8, out double value, out int bytesConsumed)
will successfully parse the double value1
, and the caller should perform additional complicated checks to determine if there is a final result, orTryParse
call should be retried with more data (fetched from Stream) such asUtf8Parser.TryParse("1.2"u8, ...)
.NET already has an
OperationStatus
enum that expresses this behavior in theOperationStatus.NeedMoreData
value. But for this purpose it may be useful to extend theOperationStatus
enum with a new value to make the difference between "Data is parsed, but may be retried with more data" and "Data is not parsed, but may be parsed with more data". E.g.Utf8Parser.TryParse("1.e"u8, ...)
returns false, butUtf8Parser.TryParse("1e7"u8, ...)
will be successful.If adding a new value to this enum is a breaking change, the new method may return
bool
withOperationStatus
out parameter, or a new enum with similar semanticAPI Proposal
API Usage
Alternative Designs
Without
OperationStatus
. Caller should analyze buffer data on failure itself.Risks
This API is only for advanced scenarios, may not be useful for regular user,
TryParse
may be too complex to implement correctly.Nullability of
out TSelf result
parameter may require a new nullability attribute. This is not needed forbool
-returning alternative implementation.IUtf8SpanScannable<TSelf>
. E.g. with vectorizationThe text was updated successfully, but these errors were encountered: