Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.
/ corefx Public archive

Add ref APIs and unit tests for System.Text.Unicode.Utf8 #34538

Merged

Conversation

GrabYourPitchforks
Copy link
Member

These are the unit tests and System.Runtime refasm changes for dotnet/coreclr#21948.

The BoundedMemory type (for internal unit test use) is a slightly improved and simplified version of the AppVerifier-like code introduced into corefxlab at dotnet/corefxlab#1881. It provides a framework for testing unsafe C# code to make sure the code under test doesn't read outside the bounds of the allowed buffers.

@stephentoub
Copy link
Member

Was this API reviewed? Can you share the issue? Thanks.

@GrabYourPitchforks
Copy link
Member Author

@stephentoub This particular API has not yet been approved. We reviewed related APIs (the OperationStatus-based transcoding APIs, not yet part of this PR) at https://github.com/dotnet/corefx/issues/34094, however no mention of them is made in the notes at https://github.com/dotnet/apireviews/blob/master/2018/System.Text.Utf8Char/README.md.

In the meeting notes it was said that IsWellFormedSequence is not useful on its own, but there are some scenarios (like WebSockets) where it matters. In those scenarios, the protocol requires that the application validate that the incoming data is well-formed UTF-8, but there's no transcoding / fixup process. The data remains in UTF-8 representation the entire time.

@stephentoub
Copy link
Member

but there are some scenarios (like WebSockets) where it matters

How does this IsWellFormedSequence work for WebSockets? This appears to not be stateful from one request to another, but the web sockets case requires being able to support carry over from one packet to the next.

@stephentoub
Copy link
Member

This particular API has not yet been approved.

In general we ask people not to submit PRs until APIs are reviewed. When is it going to be reviewed?

@GrabYourPitchforks
Copy link
Member Author

@stephentoub I was thinking that WebSockets would use something along the lines of the type https://github.com/dotnet/corefxlab/blob/master/src/System.Text.Primitives/System/Text/Encoders/Utf8Validator.cs, which would use IsWellFormedSequence as a backing API after rewrite.

@stephentoub
Copy link
Member

I was thinking that WebSockets would use something along the lines of the type

Ok, thanks. This is why starting with the API review is useful. It's not clear to me how that gets built on top of this public API as is, or if a different API is needed for that called-out use case then what this API is useful for, and a PR isn't really a good place to design the APIs.

@GrabYourPitchforks
Copy link
Member Author

Pushed a new iteration that has the unit tests for ToChars and ToBytes included.

The unit tests are basically complete at this point. The only thing that remains is to figure out what should be exposed via the ref assemblies, per @stephentoub's comment. We had an API review the other day for ToChars and ToBytes, and they were accepted with some minor API tweaks. I'm waiting for the notes from that API review to be published so that I can make the corresponding ref asm changes.

I'll also delete the IsWellFormed API from the next iteration since we're not going to expose it for now.

@GrabYourPitchforks GrabYourPitchforks changed the title Unit tests for UTF-8 validation APIs Unit tests for UTF-8 validation and chunked transcoding APIs Jan 29, 2019
@karelz
Copy link
Member

karelz commented Mar 4, 2019

@GrabYourPitchforks what is status of this PR? There seems to be no update for last 4 weeks.
If you can't finish it now, we can close it and reopen it later, when you do have time ... Thanks!

@GrabYourPitchforks
Copy link
Member Author

@karelz The corresponding coreclr PR is held up in review (and partially blocked by https://github.com/dotnet/coreclr/pull/2251, on which I've been working offline with Jan for the past while). This corefx PR is essentially finished but cannot be committed until those prereqs are committed to coreclr.

@karelz
Copy link
Member

karelz commented Mar 6, 2019

If the prereqs are more than 1-2 weeks out, let's close this PR to avoid "bugging us" ;)
Thanks!

@karelz
Copy link
Member

karelz commented Mar 18, 2019

@GrabYourPitchforks any update on ETA?

@GrabYourPitchforks GrabYourPitchforks changed the title Unit tests for UTF-8 validation and chunked transcoding APIs Add ref APIs and unit tests for System.Text.Unicode.Utf8 Mar 18, 2019
@GrabYourPitchforks
Copy link
Member Author

@karelz As soon as CI passes it's going in. Thanks for keeping on me for this. :)

@GrabYourPitchforks
Copy link
Member Author

I need to make a quick fix: The out parameters should have the num prefix dropped. We don't use that prefix in our other APIs like Utf8Formatter.

@GrabYourPitchforks
Copy link
Member Author

Unit tests are passing even though Helix seems to be having a bad day.

@GrabYourPitchforks GrabYourPitchforks merged commit cd3936e into dotnet:master Mar 18, 2019
@GrabYourPitchforks GrabYourPitchforks deleted the utf8_validation_apis branch March 18, 2019 22:05
@karelz karelz added this to the 3.0 milestone Apr 1, 2019
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…fx#34538)

Also provides the implementation of BoundedMemory for checking for buffer overruns during unit testing

Commit migrated from dotnet/corefx@cd3936e
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants