Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[API Proposal]: Enable reusing System.Net.Http.Headers.*HeaderValue instances to improve efficiency. #83543

Open
geeknoid opened this issue Mar 16, 2023 · 7 comments
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Net.Http
Milestone

Comments

@geeknoid
Copy link
Member

geeknoid commented Mar 16, 2023

Background and motivation

The various *HeaderValue types defined in System.Net.Http.Headers provide essential functionality to parse incoming headers reliably. Unfortunately, these types are one-shot only, meaning that new instances of the types need to be allocated for every header being parsed for all incoming requests. This leads to garbage and extra collections, slowing down the overall throughput of the service.

API Proposal

I recommend extending the System.Net.Http.Headers types in two ways:

  • Make the types implement the new IResettable interface to clear their state back to defaults.
  • Add an instance version of their TryParse methods which parses directly into the existing instance, as opposed to the existing static TryParse method which always returns a new instance.

So taking CacheControlHeaderValue as an example:

public class CacheControlHeaderValue : IResettable
{
    // implement the IResettable interface
    public bool TryReset();

    // add a new instance version of TryParse
    public bool TryParse(ReadOnlySpan<char> input)
    {
        // start from scratch
        TryReset();
        
        bool parsedOK = DoParsing();
        if (!parsedOK)
        {
            // clean up the mess in case of failure
            TryReset();
        }

        return parsedOK;
    }

API Usage

var cchv = new CacheControlHeaderValue();

if (cchv.TryParse(hdr1))
{
    // do something with cchv
}

if (cchv.TryParse(hdr2))
{
    // do something with cchv
}

Alternative Designs

No response

Risks

No response

@geeknoid geeknoid added the api-suggestion Early API idea and discussion, it is NOT ready for implementation label Mar 16, 2023
@dotnet-issue-labeler
Copy link

I couldn't figure out the best area label to add to this issue. If you have write-permissions please help me learn by adding exactly one area label.

@ghost ghost added the untriaged New issue has not been triaged by the area owner label Mar 16, 2023
@ghost
Copy link

ghost commented Mar 16, 2023

Tagging subscribers to this area: @dotnet/ncl
See info in area-owners.md if you want to be subscribed.

Issue Details

Background and motivation

The various *HeaderValue types defined in System.Net.Http.Headers provide essential functionality to parse incoming headers reliably. Unfortunately, these types are one-shot only, meaning that new instances of the types need to be allocated for every header being parsed for all incoming requests. This leads to garbage and extra collections, slowing down the overall throughput of the service.

API Proposal

I recommend extending the System.Net.Http.Headers types in two ways:

  • Make the types implement the new IResettable interface to clear their state back to defaults.
  • Add an instance version of their TryParse methods which parses directly into the existing instance, as opposed to the existing static TryParse method which always returns a new instance.

So taking CacheControlHeaderValue as an example:

public class CacheControlHeaderValue : IResettable
{
    // implement the IResettable interface
    public bool TryReset();

    // add a new instance version of TryParse
    public bool TryParse(ReadOnlySpan<char> input)
    {
        // start from scratch
        TryReset();
        
        bool parsedOK = DoParsing();
        if (!parsedOK)
        {
            // clean up the mess in case of failure
            TryReset();
        }

        return parsedOK;
    }

API Usage

var cchv = new CacheControlHeaderValue();

if (cchv.TryParse(hdr1))
{
    // do something with cchv
}

if (cchv.TryParse(hdr2))
{
    // do something with cchv
}

Alternative Designs

No response

Risks

No response

Author: geeknoid
Assignees: -
Labels:

api-suggestion, area-System.Net.Http, untriaged

Milestone: -

@MihaZupan
Copy link
Member

Do you happen to have any real-world data from a service showing how many of each header type you're seeing?

Taking CacheControlHeaderValue as an example, the type currently takes up 120 bytes on the heap. It would be very simple to get that down to 80 by just packing the fields better (e.g. get rid of nullable timespans). If these types are showing up in your traces this would be a good thing to do regardless of any new APIs.
The other header values are usually much smaller, e.g. 40 bytes or less.

Caching these may quickly become more expensive than just allocating them.
Some of these types hold other collections internally - are you suggesting TryReset would allow reusing those, tipping the scales somewhat?

Nit re: using ReadOnlySpan<char> as the input to TryParse:
With the current design of HttpClient, header values are always materialized as strings. Assuming the cost to allocate that value has already been paid, it's often beneficial to just use that string directly when parsing to potentially avoid allocating substrings.

@geeknoid
Copy link
Member Author

geeknoid commented Mar 16, 2023

This is peanut butter cost. 100 bytes here and there doesn't seem like much, but if you're processing a few thousand requests per second the costs mount.

Yes, all embedded collections can be reused, eliminating the incremental cost of resizing.

Regarding the suggestion to use spans rather than strings, this is driven by another objective which I have is to introduce support at the lower level for not materializing strings for the header values in the first place. Instead, the application layer can be handed fully-parsed state, considerably reducing per-request allocation burden. So this proposal would tackle a chunk of the problem and would put its API inline with many other parsing APIs which have been upgraded to spans throughout the framework.

@MihaZupan
Copy link
Member

#83640 should help with allocations here somewhat.

this is driven by another objective which I have is to introduce support at the lower level for not materializing strings for the header values in the first place. Instead, the application layer can be handed fully-parsed state, considerably reducing per-request allocation burden.

Do you have any ideas for how something like that could be done with HttpClient?
HttpClient always materialzes the name/value strings for headers it receives. We avoid allocating strings for common values, and that list could be improved for an easy allocation win (#63750).
Note that HttpClient already lazily allocates the parsed *HeaderValue objects, and you can enumerate the header collection via NonValidated without the additional parsing/allocation overhead.

I think it would be better to decide whether exposing raw Span-based access to header values is possible and beneficial with HttpClient given the constraints of existing APIs before we add TryParse(Span) APIs, or we may end up in a situation where it'd be a deoptimization to use the new span API.


If the service is okay with caching value objects and dealing with raw value buffers, would it make more sense to skip the value object allocation entirely?

Taking ViaHeaderValue as an example, it has the following shape:

public class ViaHeaderValue
{
    public string? ProtocolName { get; }
    public string ProtocolVersion { get; }
    public string ReceivedBy { get; }
    public string? Comment { get; }

While you could cache and save the allocation of the ViaHeaderValue instance, you'd still be allocating all the strings.
Would you for example consider something like this instead?

public static bool TryParse(
    ReadOnlySpan<char> value,
    out Range? protocolName,
    out Range protocolVersion,
    out Range receivedBy,
    out Range? comment);

@geeknoid
Copy link
Member Author

geeknoid commented Apr 4, 2023

@MihaZupan I think I'd go with a half-way solution.

Return a struct to get a good type to use, but replace strings with Range objects. So something like:

public class ViaHeaderValue : IResettable
{
    public bool TryReset();
    public bool TryParse(ReadOnlySpan<char> input);
    public Range ProtocolName { get; }
    public Range ProtocolVersion { get; }
    public Range ReceivedBy { get; }
    public Range Comment { get; }
}

@CarnaViire
Copy link
Member

Triage: this is a nice-to-have for perf. We would need to understand the scenarios where the API would be consumed to reach agreement on the API shape, so we would be reaching out offline @geeknoid
Also, IResettable interface is part of extensions, so it's not possible to implement it in System.Net.Http.
Given the workload on the team, this will most possibly not make its way into 8.0, so putting to Future for now.

@CarnaViire CarnaViire added this to the Future milestone Apr 6, 2023
@ghost ghost removed the untriaged New issue has not been triaged by the area owner label Apr 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
api-suggestion Early API idea and discussion, it is NOT ready for implementation area-System.Net.Http
Projects
None yet
Development

No branches or pull requests

3 participants