Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow parsing Content-Type and Accept headers with version #61427

Merged

Conversation

pgomulka
Copy link
Contributor

@pgomulka pgomulka commented Aug 21, 2020

Content-Type and Accept headers expect a versioned form of media types
like application/vnd.elasticsearch+json;compatible-with=7
when previously it was simple application/json or (cbor, yaml..) - it is still
supported

next step after #61987
relates #60516

Content-Type and Accept headers expect a verioned form of media types
like application/vnd.elasticsearch+json;compatible-with=7
when previously it was simple application/json or similar - it is still
supported
@pgomulka pgomulka added WIP :Core/Infra/REST API REST infrastructure and utilities labels Aug 21, 2020
@pgomulka pgomulka requested a review from jakelandis August 21, 2020 15:41
@pgomulka pgomulka self-assigned this Aug 21, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-core-infra (:Core/Infra/REST API)

@elasticmachine elasticmachine added the Team:Core/Infra Meta label for core/infra team label Aug 21, 2020
@pgomulka pgomulka marked this pull request as draft August 21, 2020 15:45
@@ -114,13 +116,19 @@ public XContent xContent() {
}
};

private static final Pattern COMPATIBLE_API_HEADER_PATTERN = Pattern.compile(
"(application|text)/(vnd.elasticsearch\\+)?([^;]+)(\\s*;\\s*compatible-with=(\\d+))?",
Copy link
Contributor

@sethmlarson sethmlarson Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Regex needs a few more escapes:

  • Escape the . within vnd.elasticsearch+
  • Escape the / between the type and subtype

(application|text)\\/(vnd\\.elasticsearch\\+)?([^;]+)(\\s*;\\s*compatible-with=(\\d+))?

Copy link
Member

@dakrone dakrone Aug 25, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Something I am curious about is the (\\d+))? at the end. If possible, it would be nice to define this as part of our "spec". This would be something we could be strict (eschew leniency!) about, such as enforcing it to be in a specific format (and throwing an exception when someone specifies a pattern that is not correct. We can also put the format into the documentation.

If you agree with the sentiment, I would be curious what we expect the final number to be, for example, which of these do we intend to support (now and in the future):

  • compatible-with=7
  • compatible-with=7.11
  • compatible-with=7.x
  • compatible-with=7x
  • compatible-with=7.11.2

Then I think we should make this strict so users find out early if they've send an illegal value

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson Why does the forward slash needs escaping? It's not special in Java?

I wondering if we should anchor the pattern with ^ at the start and $ at the end, to be on the safe side?

In ([^;]+)(\\s*;, the [^;]+ will also consume whitespace, so I believe you can remove the \\s*.

Can we switch any of these capture groups to non-capturing if we don't actually need the contents, just the grouping? i.e (?:<pattern>)

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pugnascotia You're right the / doesn't need an escape in Java

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sethmlarson fixed the \\.
@dakrone Only major versions are supported, hence only digits are allowed - representing the major part of the version #51816 (no dots or any other character, preventing parsing minor version)
Good point on validation - how much should be done, how precise the exception should be.. @jakelandis or @jaymode any views on this?
@pugnascotia good point on consuming a space. The subtype (json, yaml, etc) are not allowing spaces. Added the anchoring and skipped capturing when possible. I think all of the groups require capturing though (except maybe for the optional ; charset=UTF-8)

@@ -114,13 +116,19 @@ public XContent xContent() {
}
};

private static final Pattern COMPATIBLE_API_HEADER_PATTERN = Pattern.compile(
"(application|text)/(vnd.elasticsearch\\+)?([^;]+)(\\s*;\\s*compatible-with=(\\d+))?",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(vnd.elasticsearch\+)?

Am I reading this correctly that the vnd.elasticsearch part is completely optional? I thought the intent was to only add compatible-with when vnd.elasticsearch is present?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you are right. This is possibly a bit oversimplified.
Do you think we should aim to be more strict on the regex side or should we validate the values after the parsing with more relaxed regex?

The more strict could look like

Pattern.compile(
        "(application|text)/((vnd\\.elasticsearch\\+(json|smile|yaml|cbor)(\\s*;\\s*compatible-with=(\\d+)))" +
            "|(json|smile|yaml|cbor))");

there is a simplistic validation in parseVerion method, but it still allows to add compatible-with even when subtype vnd.elasticsearch is not present. The version won't be parsed though

@pgomulka
Copy link
Contributor Author

I have reworked the regex. I tried not to be overly strict as there has to be some flexibility to allow additional subtypes defined in SQL plugin (see TextFormat enum)
the regex now allows for compatible-with parameter only to be present when vnd.elasticsearch subtype is used.

@jakelandis jakelandis requested a review from jaymode August 31, 2020 16:31
@pgomulka pgomulka marked this pull request as ready for review August 31, 2020 18:30
@pgomulka pgomulka changed the title WIP Allow parsing Content-Type and Accept headers with version Allow parsing Content-Type and Accept headers with version Aug 31, 2020
@pgomulka
Copy link
Contributor Author

pgomulka commented Sep 2, 2020

I think that if we want to make the media-type parsing more strict, we have to refactor the XContentType.fromMediaType and XContentType.fromMediaTypeOrFormat
I think this methods should be refactored to either parse mediaType (application/json) or a format (a parameter like json, csv)
I think it is at the moment not clear what method to use and there is a lot of usages of XContentType.fromMediaTypeOrFormat where just XContentType.fromMediaType should be used.

I made an attempt here, but I have some SQL test failing now. #61845

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left some questions and I also wonder if we really need to use a regex for this parsing? Also, compatible-with is a vendor specific parameter if I understand correctly but I am not sure that there should be a requirement that this parameter comes first vs charset being sent in the header first?

@pgomulka
Copy link
Contributor Author

pgomulka commented Sep 2, 2020

I left some questions and I also wonder if we really need to use a regex for this parsing?

Do you think something as simple as (contains("vnd.elasticsearch") or similar would be enough? It would be harder to perform validation of the format

Also, compatible-with is a vendor specific parameter if I understand correctly but I am not sure that there should be a requirement that this parameter comes first vs charset being sent in the header first?

You are right, I think there is no need for compatible-with to be present first on the parameter list. charset, header or similar parameters can be first (I think the order can be random) - will fix

@jaymode
Copy link
Member

jaymode commented Sep 2, 2020

Do you think something as simple as (contains("vnd.elasticsearch") or similar would be enough? It would be harder to perform validation of the format

I was thinking about the possibility of having a new class for doing the parsing to move it away from XContentType and TextFormat in SQL.

public class ContentTypeParser {

    public ContentTypeParser(Set<String> acceptedMimeTypes) { // plugins should be able to provide these so we don't break sql. maybe need more than a string?

    }

   public ContentType parse(String contentTypeHeader) {
      // split string on semicolon
      // validate media type is accepted (IIRC whitespace is ok so trim it)
      // rest of strings are params. validate per RFC 7230 and use ones that we care about
     // or use a regex and we can change if necessary
   }

  public ContentType fromAcceptHeader(String acceptHeader) { // not necessary for this PR but I guess it is for SQL

   }
}

public class ContentType {
    public String mediaType() {}
    public Version compatibleWith() {} // or return the int version
    public Charset charset() {} // not necessary for this
}

Then RestRequest would need to change to use this parser and enable retrieval of the ContentType that was parsed from the request. What do you think of this?

pgomulka added a commit that referenced this pull request Sep 17, 2020
Splitting method XContentType.fromMediaTypeOrFormat into two separate methods. This will help to validate media type provided in Accept or Content-Type headers.
Extract parsing logic from XContentType (fromMediaType and fromFormat methods) to a separate MediaTypeParser class. This will help reuse the same parsing logic for XContentType and TextFormat (used in sql)

`Media-Types type/subtype; parameters` parsing is in defined https://tools.ietf.org/html/rfc7231#section-3.1.1.1

part of  #61427
Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changes look good. A few minor comments. However, could we get someone from @elastic/es-ql to take a look a the the changes for SQL ?

@jakelandis jakelandis added v8.0.0 :Analytics/SQL SQL querying and removed WIP labels Sep 29, 2020
@elasticmachine
Copy link
Collaborator

Pinging @elastic/es-ql (:Query Languages/SQL)

@elasticmachine elasticmachine added the Team:QL (Deprecated) Meta label for query languages team label Sep 29, 2020
@jakelandis
Copy link
Contributor

jakelandis commented Sep 29, 2020

I updated the label to be only for 8.0 and >breaking

For clarity, the breaking here is that with this change we will only support requests for charset=UTF-8. Prior to this change we accepted any charset requested (or any random paramater)...but I think always encoded in UTF-8 and ignored the parameters (except for SQL). So it is non-passive in the fact we are more strict about what we accept. With this PR we will not throw an exception if we don't support it (to avoid scope creep), however if the charset (or any parameters) are not supported it will result in the default JSON return type. "Accept: application/yaml;charset=junk" results in application/json. A follow up PR will throw the necessary exception to result in the more correct 40x error.

Copy link
Contributor

@jakelandis jakelandis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, pending someone from @elastic/es-ql validating the SQL part(s).

Copy link
Member

@jaymode jaymode left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Co-authored-by: Jay Modi <jaymode@users.noreply.github.com>
pgomulka added a commit that referenced this pull request Oct 2, 2020
7.x client can pass media type with a version which will return a 7.x
version of the api in ES 8.
In ES server 7 this media type shoulld be accepted but it serve the same
version of the API (7x)
relates #61427
Copy link
Contributor

@bpintea bpintea left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The SQL bits look good, I've left only some minor comments.

Map.of("header", "present|absent", "charset", "utf-8"))
.withMediaTypeAndParams(TextFormat.CSV.typeWithSubtype(), TextFormat.CSV,
Map.of("header", "present|absent", "charset", "utf-8",
"delimiter", "[^\"\n\r\t]+"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the delimiter regex derived from TextFormat.CSV#delimiter()? Because it specifies what characters are allowed, inline with what that method checks, on one hand, but allows a count of multiple character, out of line with a check there, on the other.
The exception messages in the method are explanatory and I'd ever so slightly prefer to return those (and maybe don't enforce a specific set here), but if you'd still like to enforce the param format in the parser definition, I'd suggest to maybe also limit the char count to one.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

good point, I misread that method. It should only be one character.
The exception messages are indeed explanatory in this method. With the parameter validation when parsing, we will loose them - that code will not be reachable as it will fail earlier when parsing

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I changed this to expect the presence of a delimiter parameter, but the pattern will be .+ which will always pass, and the validation will be done in TextFormat.CSV#delimiter()

pgomulka and others added 2 commits October 5, 2020 11:11
Co-authored-by: Bogdan Pintea <bpintea@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
:Analytics/SQL SQL querying >breaking :Core/Infra/REST API REST infrastructure and utilities Team:Core/Infra Meta label for core/infra team Team:QL (Deprecated) Meta label for query languages team v8.0.0-alpha1
Projects
None yet
Development

Successfully merging this pull request may close these issues.

10 participants