Skip to content

Add support for non-ASCII header values #26

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Jun 7, 2024
Merged

Add support for non-ASCII header values #26

merged 8 commits into from
Jun 7, 2024

Conversation

mmanes
Copy link
Contributor

@mmanes mmanes commented Jun 6, 2024

Add support for non-ASCII header values.
Update deps.
Switch from 'master' branch to 'main'

Fixes:

@mmanes mmanes marked this pull request as ready for review June 7, 2024 00:26
public static boolean isValueCharacter(byte ch) {
return isURICharacter(ch) || ch == ' ' || ch == '\t' || ch == '\n';
int intVal = ch & 0xFF; // Convert the value into an integer without extending the sign bit.
return isURICharacter(ch) || intVal == ' ' || intVal == '\t' || intVal == '\n' || intVal >= 0x80;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In addition to header values, this method is used when parsing the response status message. Does the RFC allow for this type of character there as well? Or is there any risk to using this same logic for both of these use cases?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That part hasn't been superseded by RFC9110, so according to RFC7230, the status reason can still (unfortunately) contain obs-text.

The reason-phrase element exists for the sole purpose of providing a
textual description associated with the numeric status code, mostly
out of deference to earlier Internet application protocols that were
more frequently used with interactive text clients. A client SHOULD
ignore the reason-phrase content.

 reason-phrase  = *( HTAB / SP / VCHAR / obs-text )

// TODO : Should this be sized with a configuration parameter?
private final StringBuilder builder = new StringBuilder();
// Allocate a 4k buffer for starters, it will grow as needed.
private final ByteArrayOutputStream byteBuffer = new ByteArrayOutputStream(4096);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose this should be ok? We will be allocating at least 4k on each request - this is quite a bit more than the previous StringBuilder - but I suppose that was likely having to re-allocate the underlying array right away.

Perhaps it is very common for the HTTP request preamble to exceed 4k? Any performance concern on this change?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I presume the previous approach did a lot of re-allocations and copying while it was parsing the prelude. This feels large enough to not need re-allocation in most cases, while not going overboard.

@mmanes mmanes merged commit 08f9b71 into main Jun 7, 2024
@mmanes mmanes deleted the mmanes/header-fix branch June 7, 2024 17:25
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants