Refactor Encoding to split fast-path and fallback logic #23098

GrabYourPitchforks · 2019-03-07T07:49:36Z

This is related to #22516, where it was suggested I work in another branch to mechanically separate all of the fallback infrastructure out of the derived Encoding classes and into a common base class. This PR represents what the change would look like for ASCIIEncoding, though UTF8 / UTF16 / UTF32 would look similar.

In particular, note that all fallback buffer related logic has been removed from ASCIIEncoding and pushed down to the base type. Additionally, the ASCIIEncoding class doesn't concern itself with EncoderNLS / DecoderNLS (aside from instantiating them). All of this coordination has been pushed down so that it can be shared across all Encoding-derived types.

This is marked WIP because I'm still making minor iterations to it to answer some lingering questions. Is it worthwhile to add methods directly to ASCIIEncodingSealed, for instance? Doing so allows us to make assumptions about what optimizations are valid for the Encoding instance.

I've not yet had an opportunity to perform benchmarking against this. I hope to get to this Thursday afternoon or Friday. My expectation is that it will perform better than baseline for calls directly to ASCIIEncoding.GetBytes and related APIs since they immediately dispatch to fast-path logic. Calls to EncoderNLS.Convert may have more invocation overhead compared to baseline due to span shuffling and an extra virtual dispatch. I don't think this extra overhead will affect many applications in practice because I don't believe EncoderNLS.Convert exists frequently in hot paths, but even so the pending vectorization of these code paths (see #22516) should make up for it.

/cc @jkotas, who has given significant feedback over the past week as this was being developed; and @tarekgh and @layomia as FYI

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs

GrabYourPitchforks · 2019-03-07T20:52:09Z

The unit test failures in CoreFX are expected with this change because this change also resolves https://github.com/dotnet/coreclr/issues/23020. Some of the unit tests assert the buggy behavior and need to be fixed up in corefx directly.

GrabYourPitchforks · 2019-03-07T23:33:30Z

@jkotas - here are some initial perf numbers from the refactored run. (Note: no vectorization has yet been introduced.)

(Perf numbers outdated - see #23098 (comment) instead.)

As expected, for very short strings the overhead of invoking the fallback logic is high. The fewer invalid characters there are the greater we're able to make progress within the fast-path methods, so the actual performance depends on the ratio of valid-to-invalid chars.

This was just testing ASCIIEncoding.GetBytes. I'll run another sample comparing EncodingNLS.Convert as well.

jkotas · 2019-03-07T23:55:43Z

Looks reasonable to me. Thanks

jkotas · 2019-03-07T23:56:27Z

The unit test failures in CoreFX are expected

https://github.com/dotnet/coreclr/blob/master/tests/CoreFX/CoreFX.issues.json to disable the tests to make the PR green.

GrabYourPitchforks · 2019-03-08T01:14:35Z

And here are the EncoderNLS.Convert numbers. As expected, they're a regression from baseline, where the crossover point is somewhere in the 20-byte range. Considering the typical use case for Encoder.Convert is to transcode chunks of data (say, for streaming scenarios), I don't think it's typical for devs to provide small payloads to this routine. My gut tells me this will not regress real-world application performance, but I cannot be absolutely sure of this claim.

(Perf results outdated - see #23098 (comment).)

src/System.Private.CoreLib/shared/System/Text/EncoderNLS.cs

GrabYourPitchforks · 2019-03-08T01:20:50Z

PR is now out of draft stage. Thanks @jkotas for getting it this far!

src/System.Private.CoreLib/shared/System/Text/Encoding.New.cs

src/System.Private.CoreLib/shared/System/Text/DecoderFallback.cs

src/System.Private.CoreLib/shared/System/Text/Encoding.New.cs

src/System.Private.CoreLib/shared/System/Text/ASCIIEncodingSealed.cs

src/System.Private.CoreLib/shared/System/Text/Encoding.New.cs

src/System.Private.CoreLib/shared/System/Text/DecoderNLS.cs

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs

jkotas · 2019-03-08T04:30:13Z

cc @stephentoub

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs

GrabYourPitchforks · 2019-03-09T01:36:28Z

The latest iteration has this moved around a bit (for GetBytes only, not for any other code paths right now). It's based somewhat on a suggestion Jan provided last week.

Here, ASCIIEncoding.GetBytesFast contains only a direct call to ASCIIUtility. The method then returns immediately; there is no looping or other logic.

Since ASCIIEncoding specifically wants to optimize the fallback path, I removed GetBytesNoFallbackBuffer since it was a bit confusing, and now ASCIIEncoding simply overrides GetBytesWithFallback. It tries to perform its optimized fallback logic in here, and if there's a problem it just delegates to base.GetBytesWithFallback. This should make it a bit more obvious what the intent of the methods are: GetBytesFast has no fallback logic whatsoever, and GetBytesWithFallback has optimized fallback logic. (If the derived class doesn't override that method, they get the slow fallback logic. I believe ASCIIEncoding is the only type that will override this method.)

I also tweaked GetBytes(..., EncoderNLS) a bit to reduce the overhead of using an EncoderNLS instance. Now, if there's no leftover data in the EncoderNLS instance that requires draining, the GetBytes(..., EncoderNLS) method will immediately try to call into the fast path with minimal overhead. If that doesn't work, it'll go through the normal make-a-span, drain, call the fast path again, then call the fallback code paths.

GrabYourPitchforks · 2019-03-09T02:38:43Z

Perf results from latest iteration, using both normal Encoding.ASCII.GetBytes(...) and EncoderNLS.Convert(...). As expected, the EncoderNLS code path is regressed for small inputs when we detect we have to fall off the fast path. This is because there's overhead to setting up the slow path, but if the input is sufficiently large (~20-ish bytes) the overhead disappears. Since the typical use case of Encoder.Convert is large buffers, I'm not too concerned here.

Method	Toolchain	StringLength	Mean	Error	StdDev	Median	Ratio	RatioSD
GetBytes	baseline	0	925.4 ns	18.524 ns	44.737 ns	912.2 ns	1.00	0.00
GetBytes	exp-A	0	489.0 ns	7.760 ns	6.879 ns	488.8 ns	0.50	0.02
GetBytes	exp-B	0	496.9 ns	7.335 ns	6.861 ns	497.3 ns	0.51	0.02

GetBytes_WithFallback	baseline	0	1,369.9 ns	21.933 ns	19.443 ns	1,372.2 ns	1.00	0.00
GetBytes_WithFallback	exp-A	0	2,823.9 ns	42.172 ns	37.384 ns	2,812.5 ns	2.06	0.04
GetBytes_WithFallback	exp-B	0	4,134.6 ns	76.445 ns	71.507 ns	4,131.3 ns	3.02	0.07

GetBytes_EncoderNLS	baseline	0	1,236.4 ns	20.782 ns	18.422 ns	1,236.3 ns	1.00	0.00
GetBytes_EncoderNLS	exp-A	0	2,329.4 ns	23.346 ns	21.838 ns	2,326.4 ns	1.88	0.03
GetBytes_EncoderNLS	exp-B	0	1,300.5 ns	16.770 ns	15.687 ns	1,295.9 ns	1.05	0.02

GetBytes_EncoderNLS_WithFallback	baseline	0	1,472.3 ns	11.164 ns	9.897 ns	1,473.0 ns	1.00	0.00
GetBytes_EncoderNLS_WithFallback	exp-A	0	2,782.6 ns	55.429 ns	59.309 ns	2,776.3 ns	1.89	0.05
GetBytes_EncoderNLS_WithFallback	exp-B	0	5,951.0 ns	89.931 ns	84.121 ns	5,964.5 ns	4.03	0.05

GetBytes	baseline	4	1,402.4 ns	30.434 ns	71.138 ns	1,367.9 ns	1.00	0.00
GetBytes	exp-A	4	631.3 ns	5.725 ns	5.356 ns	631.8 ns	0.42	0.01
GetBytes	exp-B	4	661.9 ns	9.571 ns	8.484 ns	659.3 ns	0.44	0.02

GetBytes_WithFallback	baseline	4	2,847.8 ns	54.993 ns	51.440 ns	2,848.3 ns	1.00	0.00
GetBytes_WithFallback	exp-A	4	3,322.4 ns	48.273 ns	45.154 ns	3,309.0 ns	1.17	0.03
GetBytes_WithFallback	exp-B	4	4,914.8 ns	25.895 ns	22.956 ns	4,919.2 ns	1.73	0.03

GetBytes_EncoderNLS	baseline	4	1,442.6 ns	15.503 ns	14.502 ns	1,442.8 ns	1.00	0.00
GetBytes_EncoderNLS	exp-A	4	2,457.3 ns	36.998 ns	30.895 ns	2,448.7 ns	1.71	0.03
GetBytes_EncoderNLS	exp-B	4	1,388.4 ns	25.660 ns	24.002 ns	1,389.9 ns	0.96	0.02

GetBytes_EncoderNLS_WithFallback	baseline	4	2,454.3 ns	22.189 ns	18.528 ns	2,449.1 ns	1.00	0.00
GetBytes_EncoderNLS_WithFallback	exp-A	4	3,344.4 ns	66.407 ns	62.117 ns	3,334.8 ns	1.36	0.02
GetBytes_EncoderNLS_WithFallback	exp-B	4	6,822.8 ns	97.497 ns	91.199 ns	6,832.8 ns	2.78	0.04

GetBytes	baseline	14	2,186.3 ns	42.047 ns	43.180 ns	2,180.3 ns	1.00	0.00
GetBytes	exp-A	14	1,079.2 ns	20.323 ns	19.010 ns	1,083.5 ns	0.49	0.01
GetBytes	exp-B	14	1,328.2 ns	15.760 ns	14.742 ns	1,327.8 ns	0.61	0.01

GetBytes_WithFallback	baseline	14	6,570.1 ns	129.565 ns	114.856 ns	6,526.7 ns	1.00	0.00
GetBytes_WithFallback	exp-A	14	5,198.5 ns	46.995 ns	41.660 ns	5,198.6 ns	0.79	0.01
GetBytes_WithFallback	exp-B	14	6,786.0 ns	133.975 ns	196.378 ns	6,711.4 ns	1.04	0.03

GetBytes_EncoderNLS	baseline	14	2,548.3 ns	45.750 ns	42.795 ns	2,529.0 ns	1.00	0.00
GetBytes_EncoderNLS	exp-A	14	2,942.7 ns	26.906 ns	23.852 ns	2,933.1 ns	1.16	0.02
GetBytes_EncoderNLS	exp-B	14	2,020.9 ns	7.326 ns	5.720 ns	2,020.8 ns	0.79	0.01

GetBytes_EncoderNLS_WithFallback	baseline	14	6,074.4 ns	118.687 ns	204.730 ns	5,982.1 ns	1.00	0.00
GetBytes_EncoderNLS_WithFallback	exp-A	14	5,194.4 ns	43.206 ns	40.415 ns	5,187.7 ns	0.83	0.03
GetBytes_EncoderNLS_WithFallback	exp-B	14	8,472.8 ns	167.699 ns	156.866 ns	8,391.3 ns	1.36	0.05

GetBytes	baseline	29	3,788.4 ns	38.448 ns	32.105 ns	3,788.1 ns	1.00	0.00
GetBytes	exp-A	29	2,143.2 ns	22.521 ns	21.066 ns	2,140.5 ns	0.57	0.01
GetBytes	exp-B	29	1,791.7 ns	26.507 ns	24.794 ns	1,778.2 ns	0.47	0.01

GetBytes_WithFallback	baseline	29	11,832.7 ns	114.193 ns	101.230 ns	11,815.1 ns	1.00	0.00
GetBytes_WithFallback	exp-A	29	8,193.8 ns	98.728 ns	92.350 ns	8,194.0 ns	0.69	0.01
GetBytes_WithFallback	exp-B	29	9,410.8 ns	95.752 ns	89.566 ns	9,416.8 ns	0.80	0.01

GetBytes_EncoderNLS	baseline	29	3,978.9 ns	58.239 ns	51.628 ns	3,985.4 ns	1.00	0.00
GetBytes_EncoderNLS	exp-A	29	4,007.2 ns	64.372 ns	60.213 ns	4,009.3 ns	1.01	0.02
GetBytes_EncoderNLS	exp-B	29	3,031.6 ns	59.898 ns	100.076 ns	3,014.9 ns	0.77	0.03

GetBytes_EncoderNLS_WithFallback	baseline	29	12,183.5 ns	107.718 ns	100.759 ns	12,151.0 ns	1.00	0.00
GetBytes_EncoderNLS_WithFallback	exp-A	29	7,966.7 ns	144.064 ns	127.709 ns	7,922.7 ns	0.65	0.01
GetBytes_EncoderNLS_WithFallback	exp-B	29	11,293.1 ns	252.128 ns	280.239 ns	11,196.7 ns	0.93	0.02

baseline = The state of the world today.
exp-A = The first iteration of this PR.
exp-B = The latest iteration of this PR at the time this comment was posted.

jkotas · 2019-03-09T03:49:54Z

The latest iteration has this moved around a bit

Yes, this looks better!

Since the typical use case of Encoder.Convert is large buffers, I'm not too concerned here.

+1

GrabYourPitchforks · 2019-03-11T04:43:55Z

I merged because the PR was approved, but if anybody has any additional comments leave them here or email me and I'll open another PR to address them. Thanks all for the feedback!

…lr#23098) This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32. High-level changes: - Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines. - All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class. - Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan). - Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020). Commit migrated from dotnet/coreclr@43a5159

Refactor Encoding to split fast-path and fallback logic

93d1a7c

GrabYourPitchforks mentioned this pull request Mar 7, 2019

Replace slow implementations in ASCIIUtility with fast implementations #22516

Merged

jkotas reviewed Mar 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs Show resolved Hide resolved

jkotas reviewed Mar 7, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs Outdated Show resolved Hide resolved

Remove direct dependency on JitHelpers

432173f

PR feedback - also slightly optimize EncoderNLS draining

a088a2f

jkotas reviewed Mar 8, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Text/EncoderNLS.cs Outdated Show resolved Hide resolved

Suppress failures in System.Text.Encoding.Tests

2e9ac17

GrabYourPitchforks changed the title ~~[Draft] [WIP] Refactor Encoding to split fast-path and fallback logic~~ Refactor Encoding to split fast-path and fallback logic Mar 8, 2019

GrabYourPitchforks marked this pull request as ready for review March 8, 2019 01:20

GrabYourPitchforks requested a review from tarekgh March 8, 2019 01:20

Remove unused fields from EncoderNLS

6402df3

jkotas reviewed Mar 8, 2019

View reviewed changes

tarekgh reviewed Mar 8, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs Outdated Show resolved Hide resolved

tarekgh reviewed Mar 8, 2019

View reviewed changes

src/System.Private.CoreLib/shared/System/Text/ASCIIEncoding.cs Show resolved Hide resolved

tarekgh approved these changes Mar 8, 2019

View reviewed changes

Move some GetBytes implementations around

66cb897

NoInline ASCIIUtility, as inlining was throwing off benchmarks

44a0ffd

GrabYourPitchforks added 2 commits March 10, 2019 15:32

Remove move of ASCIIEncodingSealed and move it back into ASCIIEncoding

3588a2a

Rename GetBytes -> EncodeRune, add devdoc, #ifdef away non-shipping code

c98d25f

GrabYourPitchforks added 6 commits March 10, 2019 16:57

Perform same refactoring for GetByteCount we had for GetBytes

2545ee2

Refactor GetCharCount as with other methods

d834abc

Refactor GetChars same as previous functions

93a80d5

Misc PR feedback: cleanup comments, delete dead methods

d02e3e4

Minor perf optimizations to fallback logic

0333dff

Rename Encoding.New.cs -> Encoding.Internal.cs

96b3b94

GrabYourPitchforks merged commit 43a5159 into dotnet:master Mar 11, 2019

GrabYourPitchforks deleted the ascii_7 branch March 11, 2019 06:34

GrabYourPitchforks mentioned this pull request Mar 27, 2019

Add optimized UTF-8 validation and transcoding apis, hook them up to UTF8Encoding #21948

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Refactor Encoding to split fast-path and fallback logic #23098

Refactor Encoding to split fast-path and fallback logic #23098

GrabYourPitchforks commented Mar 7, 2019

GrabYourPitchforks commented Mar 7, 2019

GrabYourPitchforks commented Mar 7, 2019 •

edited

Loading

jkotas commented Mar 7, 2019

jkotas commented Mar 7, 2019

GrabYourPitchforks commented Mar 8, 2019 •

edited

Loading

GrabYourPitchforks commented Mar 8, 2019

jkotas commented Mar 8, 2019

GrabYourPitchforks commented Mar 9, 2019

GrabYourPitchforks commented Mar 9, 2019

jkotas commented Mar 9, 2019

GrabYourPitchforks commented Mar 11, 2019

Refactor Encoding to split fast-path and fallback logic #23098

Refactor Encoding to split fast-path and fallback logic #23098

Conversation

GrabYourPitchforks commented Mar 7, 2019

GrabYourPitchforks commented Mar 7, 2019

GrabYourPitchforks commented Mar 7, 2019 • edited Loading

jkotas commented Mar 7, 2019

jkotas commented Mar 7, 2019

GrabYourPitchforks commented Mar 8, 2019 • edited Loading

GrabYourPitchforks commented Mar 8, 2019

jkotas commented Mar 8, 2019

GrabYourPitchforks commented Mar 9, 2019

GrabYourPitchforks commented Mar 9, 2019

jkotas commented Mar 9, 2019

GrabYourPitchforks commented Mar 11, 2019

GrabYourPitchforks commented Mar 7, 2019 •

edited

Loading

GrabYourPitchforks commented Mar 8, 2019 •

edited

Loading