Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unnecessary buffer allocations when writing base64 values with Utf8JsonWriter. #97628

Closed
habbes opened this issue Jan 29, 2024 · 5 comments · Fixed by #97687
Closed

Unnecessary buffer allocations when writing base64 values with Utf8JsonWriter. #97628

habbes opened this issue Jan 29, 2024 · 5 comments · Fixed by #97687
Labels
area-System.Text.Json help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Milestone

Comments

@habbes
Copy link
Contributor

habbes commented Jan 29, 2024

Description

Utf8JsonWriter.WriteBase64StringValue(ReadOnlySpan<byte>) implementation calls the following method to base64-encode the input bytes and write the result to the output buffer:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private void Base64EncodeAndWrite(ReadOnlySpan<byte> bytes, Span<byte> output, int encodingLength)
{
    byte[]? outputText = null;

    Span<byte> encodedBytes = encodingLength <= JsonConstants.StackallocByteThreshold ?
        stackalloc byte[JsonConstants.StackallocByteThreshold] :
        (outputText = ArrayPool<byte>.Shared.Rent(encodingLength));

    OperationStatus status = Base64.EncodeToUtf8(bytes, encodedBytes, out int consumed, out int written);
    Debug.Assert(status == OperationStatus.Done);
    Debug.Assert(consumed == bytes.Length);

    encodedBytes = encodedBytes.Slice(0, written);
    Span<byte> destination = output.Slice(BytesPending);

    Debug.Assert(destination.Length >= written);
    encodedBytes.Slice(0, written).CopyTo(destination);
    BytesPending += written;

    if (outputText != null)
    {
        ArrayPool<byte>.Shared.Return(outputText);
    }
}

The Base64EncodeAndWrite method allocates/rents the encodedBytes buffer, uses it to perform the base64 encoding then copies the contents to the output buffer. When you have a lot of concurrent writes (from different requests) and values larger than 256 bytes, the allocations can start to add up.

I think we can skip the temporary buffer and perform the encoding directly into the output buffer like in the snippet below:

private void Base64EncodeAndWrite(ReadOnlySpan<byte> bytes, Span<byte> output, int encodingLength)
{
  Span<byte> destination = output.Slice(BytesPending);
  OperationStatus status = Base64.EncodeToUtf8(
    bytes,
    destination,
    out int consumed,
    out int written
  );
  
  // removed Debug.Assert statements for brevity
  
  BytesPending += written;
}

If you think the proposed change is okay, I can go ahead and create a PR.

Configuration

.NET 8

Regression?

Data

Here's sample call stack where Base64EncodeAndWrite contributes to byte[] allocations in the large-object-heap.

image

@habbes habbes added the tenet-performance Performance related issue label Jan 29, 2024
@ghost ghost added the untriaged New issue has not been triaged by the area owner label Jan 29, 2024
@ghost
Copy link

ghost commented Jan 29, 2024

Tagging subscribers to this area: @dotnet/area-system-text-encoding
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Utf8JsonWriter.WriteBase64StringValue(ReadOnlySpan<byte>) implementation calls the following method to base64-encode the input bytes and write the result to the output buffer:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private void Base64EncodeAndWrite(ReadOnlySpan<byte> bytes, Span<byte> output, int encodingLength)
{
    byte[]? outputText = null;

    Span<byte> encodedBytes = encodingLength <= JsonConstants.StackallocByteThreshold ?
        stackalloc byte[JsonConstants.StackallocByteThreshold] :
        (outputText = ArrayPool<byte>.Shared.Rent(encodingLength));

    OperationStatus status = Base64.EncodeToUtf8(bytes, encodedBytes, out int consumed, out int written);
    Debug.Assert(status == OperationStatus.Done);
    Debug.Assert(consumed == bytes.Length);

    encodedBytes = encodedBytes.Slice(0, written);
    Span<byte> destination = output.Slice(BytesPending);

    Debug.Assert(destination.Length >= written);
    encodedBytes.Slice(0, written).CopyTo(destination);
    BytesPending += written;

    if (outputText != null)
    {
        ArrayPool<byte>.Shared.Return(outputText);
    }
}

The Base64EncodeAndWrite method allocates/rents the encodedBytes buffer, uses it to perform the base64 encoding then copies the contents to the output buffer. When you have a lot of concurrent writes (from different requests) and values larger than 256 bytes, the allocations can start to add up.

I think we can skip the temporary buffer and perform the encoding directly into the output buffer like in the snippet below:

private void Base64EncodeAndWrite(ReadOnlySpan<byte> bytes, Span<byte> output, int encodingLength)
{
  Span<byte> destination = output.Slice(BytesPending);
  OperationStatus status = Base64.EncodeToUtf8(
    bytes,
    destination,
    out int consumed,
    out int written
  );
  
  // removed Debug.Assert statements for brevity
  
  BytesPending += written;
}

If you think the proposed change is okay, I can go ahead and create a PR.

Configuration

Regression?

Data

Here's sample where Base64EncodeAndWrite contributes to byte[] allocatons in the large-object-heap.

image

Author: habbes
Assignees: -
Labels:

area-System.Text.Encoding, tenet-performance, untriaged

Milestone: -

@eiriktsarpalis
Copy link
Member

Seems like something we could address in conjunction with #67337

@eiriktsarpalis eiriktsarpalis added help wanted [up-for-grabs] Good issue for external contributors and removed untriaged New issue has not been triaged by the area owner labels Jan 29, 2024
@eiriktsarpalis eiriktsarpalis added this to the Future milestone Jan 29, 2024
@habbes
Copy link
Contributor Author

habbes commented Jan 29, 2024

Hi @eiriktsarpalis, should I work on a PR or wait for #67337 to be addressed?

@eiriktsarpalis
Copy link
Member

Feel free to submit a PR if you can, #67337 is not being prioritized currently. Thanks.

@ghost
Copy link

ghost commented Jan 29, 2024

Tagging subscribers to this area: @dotnet/area-system-text-json, @gregsdennis
See info in area-owners.md if you want to be subscribed.

Issue Details

Description

Utf8JsonWriter.WriteBase64StringValue(ReadOnlySpan<byte>) implementation calls the following method to base64-encode the input bytes and write the result to the output buffer:

[MethodImpl(MethodImplOptions.AggressiveInlining)]
private void Base64EncodeAndWrite(ReadOnlySpan<byte> bytes, Span<byte> output, int encodingLength)
{
    byte[]? outputText = null;

    Span<byte> encodedBytes = encodingLength <= JsonConstants.StackallocByteThreshold ?
        stackalloc byte[JsonConstants.StackallocByteThreshold] :
        (outputText = ArrayPool<byte>.Shared.Rent(encodingLength));

    OperationStatus status = Base64.EncodeToUtf8(bytes, encodedBytes, out int consumed, out int written);
    Debug.Assert(status == OperationStatus.Done);
    Debug.Assert(consumed == bytes.Length);

    encodedBytes = encodedBytes.Slice(0, written);
    Span<byte> destination = output.Slice(BytesPending);

    Debug.Assert(destination.Length >= written);
    encodedBytes.Slice(0, written).CopyTo(destination);
    BytesPending += written;

    if (outputText != null)
    {
        ArrayPool<byte>.Shared.Return(outputText);
    }
}

The Base64EncodeAndWrite method allocates/rents the encodedBytes buffer, uses it to perform the base64 encoding then copies the contents to the output buffer. When you have a lot of concurrent writes (from different requests) and values larger than 256 bytes, the allocations can start to add up.

I think we can skip the temporary buffer and perform the encoding directly into the output buffer like in the snippet below:

private void Base64EncodeAndWrite(ReadOnlySpan<byte> bytes, Span<byte> output, int encodingLength)
{
  Span<byte> destination = output.Slice(BytesPending);
  OperationStatus status = Base64.EncodeToUtf8(
    bytes,
    destination,
    out int consumed,
    out int written
  );
  
  // removed Debug.Assert statements for brevity
  
  BytesPending += written;
}

If you think the proposed change is okay, I can go ahead and create a PR.

Configuration

.NET 8

Regression?

Data

Here's sample call stack where Base64EncodeAndWrite contributes to byte[] allocations in the large-object-heap.

image

Author: habbes
Assignees: -
Labels:

area-System.Text.Json, tenet-performance, help wanted

Milestone: Future

@ghost ghost added the in-pr There is an active PR which will close this issue when it is merged label Jan 30, 2024
@ghost ghost removed the in-pr There is an active PR which will close this issue when it is merged label Jan 30, 2024
@github-actions github-actions bot locked and limited conversation to collaborators Mar 1, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-System.Text.Json help wanted [up-for-grabs] Good issue for external contributors tenet-performance Performance related issue
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants