Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: System.Buffers.Text.Base64Url #66841

Closed
TravisSpomer opened this issue Mar 18, 2022 · 9 comments
Closed

[API Proposal]: System.Buffers.Text.Base64Url #66841

TravisSpomer opened this issue Mar 18, 2022 · 9 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Buffers

Comments

@TravisSpomer
Copy link

TravisSpomer commented Mar 18, 2022

Background and motivation

Along with the well-known and oft-loved base64 encoding, IETF RFC 4648 § 5 also defines "Base 64 Encoding with URL and Filename Safe Alphabet," or base64url encoding. Base64Url is identical to the more common Base64 except that two characters are replaced so that every character is suitable for use in an URL or a filename on any platform. (+ is replaced with -, and / is replaced with _.)

System.Buffers.Text.Base64 includes methods for encoding binary data into base64 and decoding from base64, including with AVX2 and SSSE3 CPU optimizations, but no base64url. The alternatives are to run the text through string.Replace first, which is silly, or reimplement the entire encoding and decoding algorithm, which is what Microsoft.AspNetCore.WebUtilities.WebEncoders.Base64UrlEncode, Microsoft.IdentityModel.Tokens.Base64UrlEncoder, and countless other implementations have done.

My proposal is to add System.Buffers.Text.Base64Url as a high-performance companion to System.Buffers.Text.Base64, using the same optimized algorithms, but with alternate encoding and decoding tables.

API Proposal

namespace System.Buffers.Text
{
    public static class Base64Url
    {
        public static System.Buffers.OperationStatus DecodeFromUtf8(System.ReadOnlySpan<byte> utf8, System.Span<byte> bytes, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true);
        public static System.Buffers.OperationStatus DecodeFromUtf8InPlace(System.Span<byte> buffer, out int bytesWritten);
        public static System.Buffers.OperationStatus EncodeToUtf8(System.ReadOnlySpan<byte> bytes, System.Span<byte> utf8, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true);
        public static System.Buffers.OperationStatus EncodeToUtf8InPlace(System.Span<byte> buffer, int dataLength, out int bytesWritten);
        public static int GetMaxDecodedFromUtf8Length(int length);
        public static int GetMaxEncodedToUtf8Length(int length);
    }
}

Note that the above is identical to the design of System.Buffers.Text.Base64.

API Usage

string inputString = "Hi?>>> This is a test string";
byte[] input = System.Text.Encoding.ASCII.GetBytes(inputString);
byte[] output;
byte[] inputBack;

// Base64 version (exists today)
output = new byte[System.Buffers.Text.Base64.GetMaxEncodedToUtf8Length(input.Length)];
System.Buffers.Text.Base64.EncodeToUtf8(input, output, out int _, out int _);
inputBack = new byte[System.Buffers.Text.Base64.GetMaxDecodedFromUtf8Length(output.Length)];
System.Buffers.Text.Base64.DecodeFromUtf8(output, inputBack, out int _, out int _);

// Base64Url version
output = new byte[System.Buffers.Text.Base64Url.GetMaxEncodedToUtf8Length(input.Length)];
System.Buffers.Text.Base64Url.EncodeToUtf8(input, output, out int _, out int _);
inputBack = new byte[System.Buffers.Text.Base64Url.GetMaxDecodedFromUtf8Length(output.Length)];
System.Buffers.Text.Base64Url.DecodeFromUtf8(output, inputBack, out int _, out int _);

Alternative Designs

The main alternative design here would be to extend the existing System.Buffers.Text.Base64 methods with an extra optional parameter that lets the caller decide between Base64 and Base64Url encoding, but that seems unnecessarily cumbersome.

In addition, this proposed design does not include an interface for Base64 and Base64Url to implement, though one could be added since their signatures are identical.

Risks

Risk for this change seems extremely low. I chose to extract most of the code from Base64 into a new private class Base64Internal and then Base64 and Base64Url simply forward calls into Base64Internal. The performance impact should simply be one function call on the stack. The algorithms used in the existing implementation are unchanged. After compilation, the binary size should be only slightly larger: just a couple of class definitions, and six new encoding and decoding tables totaling 416 bytes.

@TravisSpomer TravisSpomer added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 18, 2022
@ghost
Copy link

ghost commented Mar 18, 2022

Tagging subscribers to this area: @dotnet/area-system-buffers
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

Along with the well-known and oft-loved base64 encoding, IETF RFC 4648 § 5 also defines "Base 64 Encoding with URL and Filename Safe Alphabet," or base64url encoding. Base64Url is identical to the more common Base64 except that two characters are replaced so that every character is suitable for use in an URL or a filename on any platform. (+ is replaced with -, and / is replaced with _.)

System.Buffers.Text.Base64 includes methods for encoding binary data into base64 and decoding from base64, including with AVX2 and SSSE3 CPU optimizations, but no base64url. The alternatives are to run the text through string.Replace first, which is silly, or reimplement the entire encoding and decoding algorithm, which is what Microsoft.AspNetCore.WebUtilities.WebEncoders.Base64UrlEncode, Microsoft.IdentityModel.Tokens.Base64UrlEncoder, and countless other implementations have done.

My proposal is to add System.Buffers.Text.Base64Url as a high-performance companion to System.Buffers.Text.Base64, using the same optimized algorithms, but with alternate encoding and decoding tables.

API Proposal

namespace System.Buffers.Text
{
    public static class Base64Url
    {
        public static unsafe OperationStatus EncodeToUtf8(ReadOnlySpan<byte> bytes, Span<byte> utf8, out int bytesConsumed, out int bytesWritten, bool isFinalBlock = true);
        public static unsafe OperationStatus EncodeToUtf8InPlace(Span<byte> buffer, int dataLength, out int bytesWritten);
        public static int GetMaxEncodedToUtf8Length(int length);
    }
}

Note that the above is identical to the design of System.Buffers.Text.Base64.

API Usage

string inputString = "Hi?>>> This is a test string";
byte[] input = System.Text.Encoding.ASCII.GetBytes(inputString);
byte[] output;
byte[] inputBack;

// Base64 version (exists today)
output = new byte[System.Buffers.Text.Base64.GetMaxEncodedToUtf8Length(input.Length)];
System.Buffers.Text.Base64.EncodeToUtf8(input, output, out int _, out int _);
inputBack = new byte[System.Buffers.Text.Base64.GetMaxDecodedFromUtf8Length(output.Length)];
System.Buffers.Text.Base64.DecodeFromUtf8(output, inputBack, out int _, out int _);

// Base64Url version
output = new byte[System.Buffers.Text.Base64Url.GetMaxEncodedToUtf8Length(input.Length)];
System.Buffers.Text.Base64Url.EncodeToUtf8(input, output, out int _, out int _);
inputBack = new byte[System.Buffers.Text.Base64Url.GetMaxDecodedFromUtf8Length(output.Length)];
System.Buffers.Text.Base64Url.DecodeFromUtf8(output, inputBack, out int _, out int _);

Alternative Designs

The main alternative design here would be to extend the existing System.Buffers.Text.Base64 methods with an extra optional parameter that lets the caller decide between Base64 and Base64Url encoding, but that seems unnecessarily cumbersome.

In addition, this proposed design does not include an interface for Base64 and Base64Url to implement, though one could be added since their signatures are identical.

Risks

Risk for this change seems extremely low. I chose to extract most of the code from Base64 into a new private class Base64Internal and then Base64 and Base64Url simply forward calls into Base64Internal. The performance impact should simply be one function call on the stack. The algorithms used in the existing implementation are unchanged. After compilation, the binary size should be only slightly larger: just a couple of class definitions, and six new encoding and decoding tables totaling 416 bytes.

Author: TravisSpomer
Assignees: -
Labels:

api-suggestion, area-System.Buffers

Milestone: -

@TravisSpomer
Copy link
Author

TravisSpomer commented Mar 18, 2022

Here's my proposed implementation, along with tests and such.

@Tornhoof
Copy link
Contributor

Your proposed implementation only works for lengths multiple of 4, correct? so you don't need to handle the padding, which is usually stripped from base64url after encoding and calculated and added after decoding.

@gfoidl
Copy link
Member

gfoidl commented Mar 18, 2022

This duplicates #1658


The alternatives are ...

You can also use https://www.nuget.org/packages/gfoidl.Base64/ which has vectorized paths for base64url too.

@TravisSpomer
Copy link
Author

TravisSpomer commented Mar 18, 2022

Your proposed implementation only works for lengths multiple of 4, correct?

Sorry, I don't follow. Is that a comment about the original algorithm? I'm not proposing changing the existing algorithm in Base64; just a refactor so that it can be used with different encoding tables. Padding would be handled the same way as in the existing base64 conversion.

@TravisSpomer
Copy link
Author

TravisSpomer commented Mar 18, 2022

This duplicates #1658

Oh, wow, I don't know how I didn't see that... punctuation differences, I guess. I went with this implementation (a separate class) because it seemed easier to understand and more likely to maintain the original code's performance characteristics.

@TravisSpomer
Copy link
Author

@gfoidl — Honest question: Is there something productive that I can do today to help with getting this in? It seems like the design I'm proposing here isn't the one that people are leaning toward, so I should probably close this issue. I could attempt to port the work you've done in your NuGet package to the .NET codebase, but I fear that would be more annoying to you than helpful.

The one thing that I can think of that might be most helpful is if I took my existing implementation, refactored it to use an "alphabet" parameter instead of a new class name per the #1658 design, and then made a PR for that. That way an implementation would make it in for .NET 7, in case the other work slips. But I get the sense that there are still some unresolved questions about padding and so perhaps even doing that would not be helpful.

(Sorry, I've never contributed to .NET before, and I'm still unfamiliar with how things work around here. 🙂)

@gfoidl
Copy link
Member

gfoidl commented Mar 21, 2022

Is there something productive that I can do today to help with getting this in?

No. Just wait for an area-owner to take over. Have a look at API Review Process to see how things work here in regards to new APIs.

then made a PR for that. That way an implementation would make it in for .NET 7

Please wait for the API to be reviewed and potentially approved, then a PR can be created.

I should probably close this issue

If I were you, then I'd close this issue here, and comment on #1658 with your ideas on the API shape to avoid duplicated and diverging thoughts -- which will make API review harder as it's split accross issues.

There one should focus only on the API shape.
The implementation is a detail which comes later. Similar to the base64 vectorization, I'd just port the code from the above mentioned package over to here again. The implementation itself is solid and fuzz-tested, etc. So it just comes down to the API shape.

Sorry, I've never contributed to .NET before, and I'm still unfamiliar with how things work around here.

No need for "sorry". It's good you ask question, so we try to bring some light on it.

@TravisSpomer
Copy link
Author

Thanks Günther! Closing this one in favor of that one.

@ghost ghost locked as resolved and limited conversation to collaborators Apr 21, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Buffers
Projects
None yet
Development

No branches or pull requests

4 participants