Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Vectorize Convert.ToBase64CharArray and TryToBase64Chars #73320

Merged
merged 3 commits into from
Aug 5, 2022

Conversation

stephentoub
Copy link
Member

#71795 vectorized Convert.ToBase64String for larger inputs by using Base64.EncodeToUTF8 and then encoding the result UTF8 bytes into a UTF16 string. It did not touch Convert.ToBase64CharArray nor Convert.TryToBase64Chars, however. The ToBase64String change makes use of a temporary array rented from the array pool, and the expectation is it'll rarely allocate, but if it does, it's part of a method that's already allocating the resulting string and so it's presumed to not be too impactful. ToBase64CharArray and TryToBase64Chars, however, are intended to be entirely non-allocating, and so even renting from the array pool could be problematic if it fails to find a buffer in the pool.

This PR changes the non-allocating variants to use Base64.EncodeToUtf8 as well. But instead of renting a temporary buffer, it banks on the knowledge that the encoded Base64 bytes are 1/2 the length of the resulting chars, since the bytes are all guaranteed to be ASCII. Thus, it can treat the destination char buffer as scratch space for the encoded UTF8 bytes, and then widen in-place. This obviates the need for a separate temporary buffer, making it appropriate for the non-allocating versions. And once we had the helper for those, we can use that same helper to replace the code added to ToBase64String, making it non-allocating as well (beyond of course the result string it has to allocate by its nature), and thus making it more predictable.

Overall, this fixes the possible additional allocation in ToBase64String as well as the performance inversion that the allocating ToBase64String could have been significantly faster (due to vectorization) than the ToBase64CharArray and ToBase64Chars methods intended to be the faster versions.

[Params(16, 64, 256, 1024)]
public int Length { get; set; }

private byte[] _data;
private char[] _scratch;

[GlobalSetup]
public void Setup()
{
    _data = new byte[Length];
    _scratch = new char[Length * 4];
    var r = new Random(42);
    r.NextBytes(_data);
}

[Benchmark]
public string ToBase64String() => Convert.ToBase64String(_data);

[Benchmark]
public void ToBase64CharArray() => Convert.ToBase64CharArray(_data, 0, _data.Length, _scratch, 0);

[Benchmark]
public void ToBase64Chars() => Convert.TryToBase64Chars(_data, _scratch, out _);
Method Toolchain Length Mean Ratio Allocated
ToBase64String \main\corerun.exe 16 34.86 ns 1.00 72 B
ToBase64String \pr\corerun.exe 16 34.82 ns 1.00 72 B
ToBase64CharArray \main\corerun.exe 16 26.08 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 16 27.12 ns 1.04 -
ToBase64Chars \main\corerun.exe 16 25.67 ns 1.00 -
ToBase64Chars \pr\corerun.exe 16 26.55 ns 1.03 -
ToBase64String \main\corerun.exe 64 50.12 ns 1.00 200 B
ToBase64String \pr\corerun.exe 64 49.22 ns 0.98 200 B
ToBase64CharArray \main\corerun.exe 64 79.72 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 64 31.39 ns 0.39 -
ToBase64Chars \main\corerun.exe 64 78.80 ns 1.00 -
ToBase64Chars \pr\corerun.exe 64 31.49 ns 0.40 -
ToBase64String \main\corerun.exe 256 137.63 ns 1.00 712 B
ToBase64String \pr\corerun.exe 256 108.71 ns 0.79 712 B
ToBase64CharArray \main\corerun.exe 256 300.65 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 256 47.43 ns 0.16 -
ToBase64Chars \main\corerun.exe 256 299.34 ns 1.00 -
ToBase64Chars \pr\corerun.exe 256 46.80 ns 0.16 -
ToBase64String \main\corerun.exe 1024 392.78 ns 1.00 2760 B
ToBase64String \pr\corerun.exe 1024 346.42 ns 0.88 2760 B
ToBase64CharArray \main\corerun.exe 1024 1,174.50 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 1024 116.84 ns 0.10 -
ToBase64Chars \main\corerun.exe 1024 1,162.84 ns 1.00 -
ToBase64Chars \pr\corerun.exe 1024 116.44 ns 0.10 -

A previous PR vectorized Convert.ToBase64String for larger inputs by using Base64.EncodeToUTF8 and then encoding the result UTF8 bytes into a UTF16 string.  It did not touch Convert.ToBase64CharArray nor Convert.TryToBase64Chars, however.  The ToBase64String change makes use of a temporary array rented from the array pool, and the expectation is it'll rarely allocate, but if it does, it's part of a method that's already allocating the resulting string and so it's presumed to not be too impactful.  ToBase64CharArray and TryToBase64Chars, however, are intended to be entirely non-allocating, and so even renting from the array pool would be problematic.

This PR changes the non-allocating variants to use Base64.EncodeToUtf8 as well.  But instead of renting a temporary buffer, it banks on the knowledge that the encoded Base64 bytes are 1/2 the length of the resulting chars, since the bytes are all guaranteed to be ASCII.  Thus, it can treat the destination char buffer as scratch space for the encoded UTF8 bytes, and then widen in-place.  This obviates the need for a separate temporary buffer, making it appropriate for the non-allocating versions.  And once we had the helper for those, we can use that same helper to replace the code added to ToBase64String, making it non-allocating as well (beyond of course the result string it has to allocate by its nature), and thus making it more predictable.

Overall, this fixes the possible additional allocation in ToBase64String as well as the performance inversion that the allocating ToBase64String could have been significantly faster (due to vectorization) than the ToBase64CharArray and ToBase64Chars methods intended to be the faster versions.
@ghost
Copy link

ghost commented Aug 3, 2022

Tagging subscribers to this area: @dotnet/area-system-runtime
See info in area-owners.md if you want to be subscribed.

Issue Details

#71795 vectorized Convert.ToBase64String for larger inputs by using Base64.EncodeToUTF8 and then encoding the result UTF8 bytes into a UTF16 string. It did not touch Convert.ToBase64CharArray nor Convert.TryToBase64Chars, however. The ToBase64String change makes use of a temporary array rented from the array pool, and the expectation is it'll rarely allocate, but if it does, it's part of a method that's already allocating the resulting string and so it's presumed to not be too impactful. ToBase64CharArray and TryToBase64Chars, however, are intended to be entirely non-allocating, and so even renting from the array pool could be problematic if it fails to find a buffer in the pool.

This PR changes the non-allocating variants to use Base64.EncodeToUtf8 as well. But instead of renting a temporary buffer, it banks on the knowledge that the encoded Base64 bytes are 1/2 the length of the resulting chars, since the bytes are all guaranteed to be ASCII. Thus, it can treat the destination char buffer as scratch space for the encoded UTF8 bytes, and then widen in-place. This obviates the need for a separate temporary buffer, making it appropriate for the non-allocating versions. And once we had the helper for those, we can use that same helper to replace the code added to ToBase64String, making it non-allocating as well (beyond of course the result string it has to allocate by its nature), and thus making it more predictable.

Overall, this fixes the possible additional allocation in ToBase64String as well as the performance inversion that the allocating ToBase64String could have been significantly faster (due to vectorization) than the ToBase64CharArray and ToBase64Chars methods intended to be the faster versions.

[Params(16, 64, 256, 1024)]
public int Length { get; set; }

private byte[] _data;
private char[] _scratch;

[GlobalSetup]
public void Setup()
{
    _data = new byte[Length];
    _scratch = new char[Length * 4];
    var r = new Random(42);
    r.NextBytes(_data);
}

[Benchmark]
public string ToBase64String() => Convert.ToBase64String(_data);

[Benchmark]
public void ToBase64CharArray() => Convert.ToBase64CharArray(_data, 0, _data.Length, _scratch, 0);

[Benchmark]
public void ToBase64Chars() => Convert.TryToBase64Chars(_data, _scratch, out _);
Method Toolchain Length Mean Ratio Allocated
ToBase64String \main\corerun.exe 16 34.86 ns 1.00 72 B
ToBase64String \pr\corerun.exe 16 34.82 ns 1.00 72 B
ToBase64CharArray \main\corerun.exe 16 26.08 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 16 27.12 ns 1.04 -
ToBase64Chars \main\corerun.exe 16 25.67 ns 1.00 -
ToBase64Chars \pr\corerun.exe 16 26.55 ns 1.03 -
ToBase64String \main\corerun.exe 64 50.12 ns 1.00 200 B
ToBase64String \pr\corerun.exe 64 49.22 ns 0.98 200 B
ToBase64CharArray \main\corerun.exe 64 79.72 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 64 31.39 ns 0.39 -
ToBase64Chars \main\corerun.exe 64 78.80 ns 1.00 -
ToBase64Chars \pr\corerun.exe 64 31.49 ns 0.40 -
ToBase64String \main\corerun.exe 256 137.63 ns 1.00 712 B
ToBase64String \pr\corerun.exe 256 108.71 ns 0.79 712 B
ToBase64CharArray \main\corerun.exe 256 300.65 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 256 47.43 ns 0.16 -
ToBase64Chars \main\corerun.exe 256 299.34 ns 1.00 -
ToBase64Chars \pr\corerun.exe 256 46.80 ns 0.16 -
ToBase64String \main\corerun.exe 1024 392.78 ns 1.00 2760 B
ToBase64String \pr\corerun.exe 1024 346.42 ns 0.88 2760 B
ToBase64CharArray \main\corerun.exe 1024 1,174.50 ns 1.00 -
ToBase64CharArray \pr\corerun.exe 1024 116.84 ns 0.10 -
ToBase64Chars \main\corerun.exe 1024 1,162.84 ns 1.00 -
ToBase64Chars \pr\corerun.exe 1024 116.44 ns 0.10 -
Author: stephentoub
Assignees: -
Labels:

area-System.Runtime, tenet-performance

Milestone: 7.0.0

@stephentoub
Copy link
Member Author

[Params(8, 16, 22, 32, 46, 64, 70, 128, 256, 1024)]
public int Length { get; set; }

private byte[] _data;
private char[] _scratch;

[GlobalSetup]
public void Setup()
{
    _data = new byte[Length];
    _scratch = new char[Length * 4];
    var r = new Random(42);
    r.NextBytes(_data);
}

[Benchmark]
public string ToBase64String() => Convert.ToBase64String(_data);

[Benchmark]
public void ToBase64CharArray() => Convert.ToBase64CharArray(_data, 0, _data.Length, _scratch, 0);
Method Toolchain Length Mean Ratio
ToBase64String \main\corerun.exe 8 22.60 ns 1.00
ToBase64String \pr\corerun.exe 8 22.39 ns 0.99
ToBase64CharArray \main\corerun.exe 8 17.14 ns 1.00
ToBase64CharArray \pr\corerun.exe 8 18.06 ns 1.05
ToBase64String \main\corerun.exe 16 34.52 ns 1.00
ToBase64String \pr\corerun.exe 16 34.39 ns 1.00
ToBase64CharArray \main\corerun.exe 16 26.15 ns 1.00
ToBase64CharArray \pr\corerun.exe 16 25.62 ns 0.98
ToBase64String \main\corerun.exe 22 44.58 ns 1.00
ToBase64String \pr\corerun.exe 22 36.16 ns 0.81
ToBase64CharArray \main\corerun.exe 22 32.45 ns 1.00
ToBase64CharArray \pr\corerun.exe 22 29.72 ns 0.92
ToBase64String \main\corerun.exe 32 55.07 ns 1.00
ToBase64String \pr\corerun.exe 32 40.58 ns 0.74
ToBase64CharArray \main\corerun.exe 32 43.26 ns 1.00
ToBase64CharArray \pr\corerun.exe 32 30.63 ns 0.71
ToBase64String \main\corerun.exe 46 74.12 ns 1.00
ToBase64String \pr\corerun.exe 46 48.29 ns 0.65
ToBase64CharArray \main\corerun.exe 46 63.22 ns 1.00
ToBase64CharArray \pr\corerun.exe 46 32.52 ns 0.51
ToBase64String \main\corerun.exe 64 48.63 ns 1.00
ToBase64String \pr\corerun.exe 64 48.90 ns 1.01
ToBase64CharArray \main\corerun.exe 64 79.96 ns 1.00
ToBase64CharArray \pr\corerun.exe 64 31.80 ns 0.40
ToBase64String \main\corerun.exe 70 53.71 ns 1.00
ToBase64String \pr\corerun.exe 70 53.74 ns 1.00
ToBase64CharArray \main\corerun.exe 70 85.77 ns 1.00
ToBase64CharArray \pr\corerun.exe 70 34.40 ns 0.40
ToBase64String \main\corerun.exe 128 70.16 ns 1.00
ToBase64String \pr\corerun.exe 128 69.40 ns 0.99
ToBase64CharArray \main\corerun.exe 128 149.68 ns 1.00
ToBase64CharArray \pr\corerun.exe 128 37.28 ns 0.25
ToBase64String \main\corerun.exe 256 129.82 ns 1.00
ToBase64String \pr\corerun.exe 256 108.09 ns 0.83
ToBase64CharArray \main\corerun.exe 256 296.20 ns 1.00
ToBase64CharArray \pr\corerun.exe 256 46.06 ns 0.16
ToBase64String \main\corerun.exe 1024 387.87 ns 1.00
ToBase64String \pr\corerun.exe 1024 344.63 ns 0.89
ToBase64CharArray \main\corerun.exe 1024 1,154.97 ns 1.00
ToBase64CharArray \pr\corerun.exe 1024 114.74 ns 0.10

@stephentoub
Copy link
Member Author

Failure is #73247

@kunalspathak
Copy link
Member

linux/arm64 improvements dotnet/perf-autofiling-issues#7250

@kunalspathak
Copy link
Member

windows/arm64 improvements dotnet/perf-autofiling-issues#7244

@ghost ghost locked as resolved and limited conversation to collaborators Sep 10, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants