Skip to content
This repository has been archived by the owner on Jan 23, 2023. It is now read-only.

Refactor Encoding to split fast-path and fallback logic #23098

Merged
merged 15 commits into from
Mar 11, 2019

Conversation

GrabYourPitchforks
Copy link
Member

This is related to #22516, where it was suggested I work in another branch to mechanically separate all of the fallback infrastructure out of the derived Encoding classes and into a common base class. This PR represents what the change would look like for ASCIIEncoding, though UTF8 / UTF16 / UTF32 would look similar.

In particular, note that all fallback buffer related logic has been removed from ASCIIEncoding and pushed down to the base type. Additionally, the ASCIIEncoding class doesn't concern itself with EncoderNLS / DecoderNLS (aside from instantiating them). All of this coordination has been pushed down so that it can be shared across all Encoding-derived types.

This is marked WIP because I'm still making minor iterations to it to answer some lingering questions. Is it worthwhile to add methods directly to ASCIIEncodingSealed, for instance? Doing so allows us to make assumptions about what optimizations are valid for the Encoding instance.

I've not yet had an opportunity to perform benchmarking against this. I hope to get to this Thursday afternoon or Friday. My expectation is that it will perform better than baseline for calls directly to ASCIIEncoding.GetBytes and related APIs since they immediately dispatch to fast-path logic. Calls to EncoderNLS.Convert may have more invocation overhead compared to baseline due to span shuffling and an extra virtual dispatch. I don't think this extra overhead will affect many applications in practice because I don't believe EncoderNLS.Convert exists frequently in hot paths, but even so the pending vectorization of these code paths (see #22516) should make up for it.

/cc @jkotas, who has given significant feedback over the past week as this was being developed; and @tarekgh and @layomia as FYI

@GrabYourPitchforks
Copy link
Member Author

The unit test failures in CoreFX are expected with this change because this change also resolves https://github.com/dotnet/coreclr/issues/23020. Some of the unit tests assert the buggy behavior and need to be fixed up in corefx directly.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 7, 2019

@jkotas - here are some initial perf numbers from the refactored run. (Note: no vectorization has yet been introduced.)

(Perf numbers outdated - see #23098 (comment) instead.)

As expected, for very short strings the overhead of invoking the fallback logic is high. The fewer invalid characters there are the greater we're able to make progress within the fast-path methods, so the actual performance depends on the ratio of valid-to-invalid chars.

This was just testing ASCIIEncoding.GetBytes. I'll run another sample comparing EncodingNLS.Convert as well.

@jkotas
Copy link
Member

jkotas commented Mar 7, 2019

Looks reasonable to me. Thanks

@jkotas
Copy link
Member

jkotas commented Mar 7, 2019

The unit test failures in CoreFX are expected

https://github.com/dotnet/coreclr/blob/master/tests/CoreFX/CoreFX.issues.json to disable the tests to make the PR green.

@GrabYourPitchforks
Copy link
Member Author

GrabYourPitchforks commented Mar 8, 2019

And here are the EncoderNLS.Convert numbers. As expected, they're a regression from baseline, where the crossover point is somewhere in the 20-byte range. Considering the typical use case for Encoder.Convert is to transcode chunks of data (say, for streaming scenarios), I don't think it's typical for devs to provide small payloads to this routine. My gut tells me this will not regress real-world application performance, but I cannot be absolutely sure of this claim.

(Perf results outdated - see #23098 (comment).)

@GrabYourPitchforks GrabYourPitchforks changed the title [Draft] [WIP] Refactor Encoding to split fast-path and fallback logic Refactor Encoding to split fast-path and fallback logic Mar 8, 2019
@GrabYourPitchforks GrabYourPitchforks marked this pull request as ready for review March 8, 2019 01:20
@GrabYourPitchforks
Copy link
Member Author

PR is now out of draft stage. Thanks @jkotas for getting it this far!

@jkotas
Copy link
Member

jkotas commented Mar 8, 2019

cc @stephentoub

@GrabYourPitchforks
Copy link
Member Author

The latest iteration has this moved around a bit (for GetBytes only, not for any other code paths right now). It's based somewhat on a suggestion Jan provided last week.

Here, ASCIIEncoding.GetBytesFast contains only a direct call to ASCIIUtility. The method then returns immediately; there is no looping or other logic.

Since ASCIIEncoding specifically wants to optimize the fallback path, I removed GetBytesNoFallbackBuffer since it was a bit confusing, and now ASCIIEncoding simply overrides GetBytesWithFallback. It tries to perform its optimized fallback logic in here, and if there's a problem it just delegates to base.GetBytesWithFallback. This should make it a bit more obvious what the intent of the methods are: GetBytesFast has no fallback logic whatsoever, and GetBytesWithFallback has optimized fallback logic. (If the derived class doesn't override that method, they get the slow fallback logic. I believe ASCIIEncoding is the only type that will override this method.)

I also tweaked GetBytes(..., EncoderNLS) a bit to reduce the overhead of using an EncoderNLS instance. Now, if there's no leftover data in the EncoderNLS instance that requires draining, the GetBytes(..., EncoderNLS) method will immediately try to call into the fast path with minimal overhead. If that doesn't work, it'll go through the normal make-a-span, drain, call the fast path again, then call the fallback code paths.

@GrabYourPitchforks
Copy link
Member Author

Perf results from latest iteration, using both normal Encoding.ASCII.GetBytes(...) and EncoderNLS.Convert(...). As expected, the EncoderNLS code path is regressed for small inputs when we detect we have to fall off the fast path. This is because there's overhead to setting up the slow path, but if the input is sufficiently large (~20-ish bytes) the overhead disappears. Since the typical use case of Encoder.Convert is large buffers, I'm not too concerned here.

Method Toolchain StringLength Mean Error StdDev Median Ratio RatioSD
GetBytes baseline 0 925.4 ns 18.524 ns 44.737 ns 912.2 ns 1.00 0.00
GetBytes exp-A 0 489.0 ns 7.760 ns 6.879 ns 488.8 ns 0.50 0.02
GetBytes exp-B 0 496.9 ns 7.335 ns 6.861 ns 497.3 ns 0.51 0.02
GetBytes_WithFallback baseline 0 1,369.9 ns 21.933 ns 19.443 ns 1,372.2 ns 1.00 0.00
GetBytes_WithFallback exp-A 0 2,823.9 ns 42.172 ns 37.384 ns 2,812.5 ns 2.06 0.04
GetBytes_WithFallback exp-B 0 4,134.6 ns 76.445 ns 71.507 ns 4,131.3 ns 3.02 0.07
GetBytes_EncoderNLS baseline 0 1,236.4 ns 20.782 ns 18.422 ns 1,236.3 ns 1.00 0.00
GetBytes_EncoderNLS exp-A 0 2,329.4 ns 23.346 ns 21.838 ns 2,326.4 ns 1.88 0.03
GetBytes_EncoderNLS exp-B 0 1,300.5 ns 16.770 ns 15.687 ns 1,295.9 ns 1.05 0.02
GetBytes_EncoderNLS_WithFallback baseline 0 1,472.3 ns 11.164 ns 9.897 ns 1,473.0 ns 1.00 0.00
GetBytes_EncoderNLS_WithFallback exp-A 0 2,782.6 ns 55.429 ns 59.309 ns 2,776.3 ns 1.89 0.05
GetBytes_EncoderNLS_WithFallback exp-B 0 5,951.0 ns 89.931 ns 84.121 ns 5,964.5 ns 4.03 0.05
GetBytes baseline 4 1,402.4 ns 30.434 ns 71.138 ns 1,367.9 ns 1.00 0.00
GetBytes exp-A 4 631.3 ns 5.725 ns 5.356 ns 631.8 ns 0.42 0.01
GetBytes exp-B 4 661.9 ns 9.571 ns 8.484 ns 659.3 ns 0.44 0.02
GetBytes_WithFallback baseline 4 2,847.8 ns 54.993 ns 51.440 ns 2,848.3 ns 1.00 0.00
GetBytes_WithFallback exp-A 4 3,322.4 ns 48.273 ns 45.154 ns 3,309.0 ns 1.17 0.03
GetBytes_WithFallback exp-B 4 4,914.8 ns 25.895 ns 22.956 ns 4,919.2 ns 1.73 0.03
GetBytes_EncoderNLS baseline 4 1,442.6 ns 15.503 ns 14.502 ns 1,442.8 ns 1.00 0.00
GetBytes_EncoderNLS exp-A 4 2,457.3 ns 36.998 ns 30.895 ns 2,448.7 ns 1.71 0.03
GetBytes_EncoderNLS exp-B 4 1,388.4 ns 25.660 ns 24.002 ns 1,389.9 ns 0.96 0.02
GetBytes_EncoderNLS_WithFallback baseline 4 2,454.3 ns 22.189 ns 18.528 ns 2,449.1 ns 1.00 0.00
GetBytes_EncoderNLS_WithFallback exp-A 4 3,344.4 ns 66.407 ns 62.117 ns 3,334.8 ns 1.36 0.02
GetBytes_EncoderNLS_WithFallback exp-B 4 6,822.8 ns 97.497 ns 91.199 ns 6,832.8 ns 2.78 0.04
GetBytes baseline 14 2,186.3 ns 42.047 ns 43.180 ns 2,180.3 ns 1.00 0.00
GetBytes exp-A 14 1,079.2 ns 20.323 ns 19.010 ns 1,083.5 ns 0.49 0.01
GetBytes exp-B 14 1,328.2 ns 15.760 ns 14.742 ns 1,327.8 ns 0.61 0.01
GetBytes_WithFallback baseline 14 6,570.1 ns 129.565 ns 114.856 ns 6,526.7 ns 1.00 0.00
GetBytes_WithFallback exp-A 14 5,198.5 ns 46.995 ns 41.660 ns 5,198.6 ns 0.79 0.01
GetBytes_WithFallback exp-B 14 6,786.0 ns 133.975 ns 196.378 ns 6,711.4 ns 1.04 0.03
GetBytes_EncoderNLS baseline 14 2,548.3 ns 45.750 ns 42.795 ns 2,529.0 ns 1.00 0.00
GetBytes_EncoderNLS exp-A 14 2,942.7 ns 26.906 ns 23.852 ns 2,933.1 ns 1.16 0.02
GetBytes_EncoderNLS exp-B 14 2,020.9 ns 7.326 ns 5.720 ns 2,020.8 ns 0.79 0.01
GetBytes_EncoderNLS_WithFallback baseline 14 6,074.4 ns 118.687 ns 204.730 ns 5,982.1 ns 1.00 0.00
GetBytes_EncoderNLS_WithFallback exp-A 14 5,194.4 ns 43.206 ns 40.415 ns 5,187.7 ns 0.83 0.03
GetBytes_EncoderNLS_WithFallback exp-B 14 8,472.8 ns 167.699 ns 156.866 ns 8,391.3 ns 1.36 0.05
GetBytes baseline 29 3,788.4 ns 38.448 ns 32.105 ns 3,788.1 ns 1.00 0.00
GetBytes exp-A 29 2,143.2 ns 22.521 ns 21.066 ns 2,140.5 ns 0.57 0.01
GetBytes exp-B 29 1,791.7 ns 26.507 ns 24.794 ns 1,778.2 ns 0.47 0.01
GetBytes_WithFallback baseline 29 11,832.7 ns 114.193 ns 101.230 ns 11,815.1 ns 1.00 0.00
GetBytes_WithFallback exp-A 29 8,193.8 ns 98.728 ns 92.350 ns 8,194.0 ns 0.69 0.01
GetBytes_WithFallback exp-B 29 9,410.8 ns 95.752 ns 89.566 ns 9,416.8 ns 0.80 0.01
GetBytes_EncoderNLS baseline 29 3,978.9 ns 58.239 ns 51.628 ns 3,985.4 ns 1.00 0.00
GetBytes_EncoderNLS exp-A 29 4,007.2 ns 64.372 ns 60.213 ns 4,009.3 ns 1.01 0.02
GetBytes_EncoderNLS exp-B 29 3,031.6 ns 59.898 ns 100.076 ns 3,014.9 ns 0.77 0.03
GetBytes_EncoderNLS_WithFallback baseline 29 12,183.5 ns 107.718 ns 100.759 ns 12,151.0 ns 1.00 0.00
GetBytes_EncoderNLS_WithFallback exp-A 29 7,966.7 ns 144.064 ns 127.709 ns 7,922.7 ns 0.65 0.01
GetBytes_EncoderNLS_WithFallback exp-B 29 11,293.1 ns 252.128 ns 280.239 ns 11,196.7 ns 0.93 0.02

baseline = The state of the world today.
exp-A = The first iteration of this PR.
exp-B = The latest iteration of this PR at the time this comment was posted.

@jkotas
Copy link
Member

jkotas commented Mar 9, 2019

The latest iteration has this moved around a bit

Yes, this looks better!

Since the typical use case of Encoder.Convert is large buffers, I'm not too concerned here.

+1

@GrabYourPitchforks GrabYourPitchforks merged commit 43a5159 into dotnet:master Mar 11, 2019
@GrabYourPitchforks
Copy link
Member Author

I merged because the PR was approved, but if anybody has any additional comments leave them here or email me and I'll open another PR to address them. Thanks all for the feedback!

@GrabYourPitchforks GrabYourPitchforks deleted the ascii_7 branch March 11, 2019 06:34
picenka21 pushed a commit to picenka21/runtime that referenced this pull request Feb 18, 2022
…lr#23098)

This refactoring is limited to ASCIIEncoding at the moment, but it can easily be applied to UTF-8 / UTF-16 / UTF-32.

High-level changes:
- Fallback logic has been split from the fast-path, improving performance of GetBytes and similar routines.
- All of the plumbing of when to invoke the fallback logic and how to manage leftover data has been moved into the base class.
- Almost all of the logic except for the fast-path is now written in terms of verifiable code (Span and ReadOnlySpan).
- Minor bug fixes in EncoderNLS.Convert (see https://github.com/dotnet/coreclr/issues/23020).

Commit migrated from dotnet/coreclr@43a5159
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants