Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Use lowercase for charsets #11741 #12347

Merged
merged 12 commits into from
Oct 16, 2024
Merged

Conversation

gregw
Copy link
Contributor

@gregw gregw commented Oct 3, 2024

Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names. Took the opportunity for some minor optimizations:

  • use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
  • keep the MimeType.Type associated with the pre encoded Content-Type fields

Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names.
Took the opportunity for some minor optimizations:
 + use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
 + keep the MimeType.Type associated with the pre encoded Content-Type fields
@gregw gregw linked an issue Oct 3, 2024 that may be closed by this pull request
@gregw gregw requested a review from joakime October 3, 2024 23:22
gregw added 3 commits October 4, 2024 11:21
Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names.
Took the opportunity for some minor optimizations:
 + use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
 + keep the MimeType.Type associated with the pre encoded Content-Type fields
Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names.
Took the opportunity for some minor optimizations:
 + use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
 + keep the MimeType.Type associated with the pre encoded Content-Type fields
Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names.
Took the opportunity for some minor optimizations:
 + use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
 + keep the MimeType.Type associated with the pre encoded Content-Type fields
gregw added 2 commits October 6, 2024 11:28
Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names.
Took the opportunity for some minor optimizations:
 + use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
 + keep the MimeType.Type associated with the pre encoded Content-Type fields
Fix #11741 as per the WhatTFWG recommendations, use lower case for charset names.
Took the opportunity for some minor optimizations:
 + use the already made HttpField instance in MimeTypes.Type rather than create a new one in the HttpParser.CACHE
 + keep the MimeType.Type associated with the pre encoded Content-Type fields
Copy link
Contributor

@joakime joakime left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strange location for the ContentTypeField.

Also, what happens if we use (in jetty-core Handler) the following?

response.getHeaders().put(HttpHeader.CONTENT_TYPE, "text/html; charset=UTF-8");

Does that produce a UTF-8? or a utf-8 on the response headers that the client sees?

@@ -876,4 +919,25 @@ else if (' ' != b)
return value;
return builder.toString();
}

public static class ContentTypeField extends PreEncodedHttpField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't this make more sense as a standalone class?
After all we have all of the other specialized Fields as standalone classes, why is this one special?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only other extension of PreEncodedHttpField we have is ResponseHttpFields.PersesitentPreEncodedHttpField. Of the HttpField extensions, about half are sub-types.

So this is not unusual. It is a PreEncodedHttpField that is extended specifically to hold a MimeTypes.Type, so having it associated with MimeTypes seams reasonable.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This ContentTypeField feels like the HostPortHttpField. (also a field used in HttpParser)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree that this would be better as standalone class, especially now that is the return value of a method: it's shorter to assign to a variable than MimeTypes.ContentTypeField.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As this is a rather special purpose class, I've just made is package protected and provided static methods to extract the info if needed. I could make it fully private if we want to give up on the optimized lookup of content-types with spaces in HttpParser

@gregw
Copy link
Contributor Author

gregw commented Oct 9, 2024

Strange location for the ContentTypeField.

Not that strange I think. I'll add some javadoc to explain the class and I think that will make the location seam reasonable

Also, what happens if we use (in jetty-core Handler) the following?

response.getHeaders().put(HttpHeader.CONTENT_TYPE, "text/html; charset=UTF-8");

Does that produce a UTF-8? or a utf-8 on the response headers that the client sees?

We will send headers as the app sets them. If they set them with a specific case, then we will send that case. But if they use APIs to setCharset etc. then we should generate as lowercase.
This PR is more about improving the parsing side, so that when we find a cached content-type, we return it as lowercase rather than the previous uppercase.

@joakime
Copy link
Contributor

joakime commented Oct 9, 2024

We will send headers as the app sets them. If they set them with a specific case, then we will send that case. But if they use APIs to setCharset etc. then we should generate as lowercase.
This PR is more about improving the parsing side, so that when we find a cached content-type, we return it as lowercase rather than the previous uppercase.

Wont this cause a change in how the HttpClient parses responses too?

@gregw gregw requested a review from joakime October 9, 2024 21:36
@@ -876,4 +919,25 @@ else if (' ' != b)
return value;
return builder.toString();
}

public static class ContentTypeField extends PreEncodedHttpField
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I kind of agree that this would be better as standalone class, especially now that is the return value of a method: it's shorter to assign to a variable than MimeTypes.ContentTypeField.

public boolean isCharsetAssumed()
{
return _assumedCharset;
}

public HttpField getContentTypeField()
public ContentTypeField getContentTypeField()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should not the method below public HttpField getContentTypeField(Charset charset) also have its return type changed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, because there are two aspect of ContentTypeField: it has a getMimeType method and it is a PreEncodedHttpField. For an arbitrary contentType returned from this method, it is not guaranteed that it will have a known mimeType and it is not known if it is worthwhile going to the effort of pre-encoding.

That is why I think this class belongs as a subtype of this class, as it is pretty special purpose for long held constants for known types.

@gregw gregw requested a review from sbordet October 14, 2024 21:53
@gregw
Copy link
Contributor Author

gregw commented Oct 14, 2024

@sbordet I've made the class package protected.
Can we turn this review around quickly, as I was waiting 4 days for a clean CI and now have to go back to square 1.

@gregw gregw merged commit b947b48 into jetty-12.1.x Oct 16, 2024
10 checks passed
@gregw gregw deleted the fix/jetty-12.1.x/11741/mimetypes branch October 16, 2024 04:54
@olamy olamy mentioned this pull request Oct 17, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: ✅ Done
Development

Successfully merging this pull request may close these issues.

Review case of MimeType.Type charsets
3 participants