-
Notifications
You must be signed in to change notification settings - Fork 24.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add ability to parse Content-Type from content type contains charset #27301
Conversation
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
I don't think it closes that issue you mentioned, but it is related to #27065. |
I wish it would though :) |
@elasticmachine please test this |
@DaveCTurner My mistake... 👍 |
final String lowercaseContentType = restRequest.header("Content-Type").toLowerCase(Locale.ROOT); | ||
final String[] elements = lowercaseContentType.split(";"); | ||
final String lowercaseMediaType = elements[0].trim(); | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel we should still have some form of warning if you claim to be using a charset that isn't UTF-{8,16,32}, as per #22769. This might not be the time or the place to ask for that, however.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This change is not enough for me. It certainly fixes the errant deprecation logger warning (and that we are not obeying the content type), but now it's leniently ignoring any parameters on the content type.
@jasontedor are you suggesting to parse the charset as part of this PR then aka address #22769 ? |
We currently don't do anything with charset other than issuing a wrong deprecation warning due to it right? I understand the reasons why leniency should be avoided, but why does solving the wrong deprecation warning means parsing the charset and doing things with it. Is there some middle ground between the two so that we can remove the deprecation warning (hopefully by getting this PR in) without addressing #22769 at the same time? |
Could it just accept the literal |
I think splitting as is implemented in this PR is fine if the charset is checked. Since at the moment we don't actually support anything but utf8, we can just fail if the charset is present and not utf8? Actually supporting charset values can be a followup, but at least we won't add leniency, and it doesn't allow odd behavior (sending a charset for say utf 16 failing in decoding). |
I was going to suggest the same thing as @rjernst but now I can take a shortcut and simply register my agreement with his proposal. 😄 |
It seems that we also accept UTF-16 and UTF-32 (both little and big endian variants) via autodetection, but nothing weirder:
We may not support them, of course :) |
I pushed bc424b3 to validate charset. @DaveCTurner I'm OK to add utf-16[32] if both you think the way to fix the issue is make sense. Thanks everyone's comments. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd like to see if we can get this fix merged in and ported to 6.x so that requests do not fail if a charset is specified. @liketic are you able to update the PR (merge 5.6)?
@DaveCTurner @jasontedor can you take a look again at the changes?
@@ -308,6 +309,23 @@ private boolean hasContentTypeOrCanAutoDetect(final RestRequest restRequest, fin | |||
return true; | |||
} | |||
|
|||
private String parseMediaType(String rawContentType) { | |||
final String contentType = rawContentType.toLowerCase(Locale.ROOT); | |||
int firstSemiColonIndex = contentType.indexOf(';'); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
make it final
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for picking this up. I pushed 0fabbd0 . The merge target is 5.6 now. Thank you so much!
Since this is a community submitted pull request, a Jenkins build has not been kicked off automatically. Can an Elastic organization member please verify the contents of this patch and then kick off a build manually? |
To echo Jay, any hope of seeing this in a near-term 6.x release (e.g. 6.3)? |
I'm -½ on this change. It's a quick-and-dirty hack that means we accept a few more I also note that we are inconsistent with our leniency in this area, comparing our handling of
I think this needs a deeper look. |
@DaveCTurner I agree, which is why I believe that the validation of the charset should be separate from this PR. A portion of this PR is still important, which is the ability to accept a x-ndjson content type header that has the charset or other parameters specified. We do this already for all other content types we support, see |
again agree with Jay: The rejection of the
It's a valid call, and while the charset would be ignored, accepting that request will be consistent with past behavior as well as consistent with current behavior when combined with the |
I don't think we should add leniency because leniency is incredibly hard to remove later (people start depending on it).
But now we are parsing content type, thus we should parse the entire content type. If we ignore parts (ie charset) it is not better than the state we were already in (when we "auto detected" everything). @DaveCTurner Your curl examples aren't showing the headers. This PR, as it is currently, is only emitting deprecation warnings. I think we should actually be erroring if we are to parse the charset. Your earlier examples of support UTF16 and UTF32 only work because of the underlying parser we use. We could swap that out today and no tests would fail. Therefore, I claim we only support UTF8 (the only encoding we use in tests, afaik). I also do not think this is a "quick and dirty hack". It is parsing the charset. We do need to also fail on other parameters, and invalid charset values. I don't think this should be a deprecation warning. Bogus and unsupported charset values already won't work (aside from the ones that "work" above, which I'm torn on whether to fail hard on or give a warning). |
just for clarity, the issue to which I'm referring, previously linked this one, is #28123 |
I'd agree in most cases, but this one I view differently because we do not use the charset at all (#22769) and it causes issues with certain clients that send the charset in 6.x. The leniency simply dropping the charset for x-ndjson introduces leaves us in a no-worse position than we are in today with ignoring charset. I'd like to fix our use of charset everywhere but that is a breaking change and should be handled separately. The change to not allow the charset when using x-ndjson is also a breaking change that wasn't documented or intended in 6.0.
The current change in this PR, is incomplete as it only parses charset in the case where we haven't already parsed the content type and mapped it to the appropriate XContentType. |
There were no interesting headers to report (unlike in 5.6.8):
The point is that in we already treat the charset leniently for
It's checking for the expected charset as a |
The existence of leniency does not mean we should propagate or increase the leniency.
I wasn't aware of the quoted strings/case-insensitive side of the spec (I see it now). I think that should be fixed here (or in a followup). @jaymode @DaveCTurner Can you help me understand where your reluctance is to parse the parameters from content-type and at least verify they match what we actually support today? Adding parsing (as in this PR) will fix the original "x-ndjson is erroneously rejected" problem. But adding it at the cost of furthering leniency (intentionally ignoring the media type when we can verify it) doesn't seem to help anyone. |
I am reluctant to have parsing the parameters tied to this change as we are holding up a change that fixes something broken in 6.x while we get the validation logic correct. Parameter values should be validated to ensure they are in the correct format (token or quoted string per RFC 7231) and with this we need to add tests for this validation. Additionally, we need to ensure we do this in the correct places; not just one place for x-ndjson. Are there other parameters that we should accept/validate? All of these are missing from this PR right now and I believe this should not be tossed in as part of a small fix from a community member that was originally submitted 4 months ago. |
Looking at the discussion above, it seems like we reached no agreement on getting this in. I think there's no point in keeping it open then, feel free to reopen if you disagree. |
Closes #27065
cc @javanna