Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix, clarify, and simplify content type schemas #2351

Merged
merged 6 commits into from
Feb 11, 2021
Merged
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
34 changes: 17 additions & 17 deletions versions/3.1.0.md
Original file line number Diff line number Diff line change
Expand Up @@ -1433,14 +1433,11 @@ application/json:

##### Considerations for File Uploads

In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type. In contrast with the 3.0 specification, such schemas use the `contentEncoding` JSON Schema keyword rather than the `format` keyword. This keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7).

JSON Schema also offers a `contentMediaType` keyword. However, when the media type is already specified by the
Media Type Object's key, or by the `contentType` field of an [Encoding Object](#encodingObject), the `contentMediaType` keyword SHALL be ignored if present.
In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type. In contrast with the 3.0 specification, such schemas either omit the `type` (in place of `format: binary`), or use `contentMediaType` and `contentEncoding` with `type: string`. The `contentEncoding` keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" (which replaces `format: byte`) and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7).
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here is my proposed rewording to clarify the priority of Content-Encoding headers (in various places) and the JSON Schema contentEncoding:

Suggested change
In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type. In contrast with the 3.0 specification, such schemas either omit the `type` (in place of `format: binary`), or use `contentMediaType` and `contentEncoding` with `type: string`. The `contentEncoding` keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" (which replaces `format: byte`) and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7).
In contrast with the 2.0 specification, `file` input/output content in OpenAPI is described with the same semantics as any other schema type.
In contrast with the 3.0 specification, the `format` keyword has no effect on the content-encoding of the schema. The Content-Encoding for the schema can be specified explicitly with the `Content-Encoding` parameter of the operation, for schemas in request bodies, or in a `Content-Encoding` header in the `headers` of the response, for response schemas, or in a `Content-Encoding` header in the `headers` of an Encoding object associated with a request or response body property. JSON Schema also offers a `contentEncoding` keyword, which may be used to specify the `Content-Encoding` for the schema, but `contentEncoding` SHALL be ignored if a `Content-Encoding` header is defined. The `contentEncoding` keyword supports all encodings defined in [RFC4648](https://tools.ietf.org/html/rfc4648), including "base64" (which replaces `format: byte`) and "base64url", as well as "quoted-printable" from [RFC2045](https://tools.ietf.org/html/rfc2045#section-6.7). When not explicitly specified, the content-encoding
JSON Schema also offers a `contentMediaType` keyword. However, when the media type is already specified by the Media Type Object's key, or by the `contentType` field of an [Encoding Object](#encodingObject), the `contentMediaType` keyword SHALL be ignored if present.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkistler the Content-Encoding HTTP header and contentEncoding JSON Schema keyword are unrelated. The HTTP header is for things like gzip that the HTTP client and server will encode/decode automatically. The JSON Schema keyword is for when both HTTP and the media type think that something is text, but the application needs to be informed that some specific piece of text needs to be encoded/decoded in a particular way.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could in fact have a gzipped payload that includes a base64-encoded PNG file. That would use both the header and the keyword.

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@handrews I struggling to understand this all. Do you mean to say that contentEncoding keyword has no effect whatsoever on the value that should/will be passed in the Content-Encoding header? If that is the case then it certainly will confuse some readers (like me 😄).

In the example you give, how would this look in the API doc? Maybe:

  /v1/photos:
    post:
      parameters:
        - name: Content-Encoding
           in: header
           schema:
             type: string
             enum:
               - gzip
        requestBody:
          content:
            image/png:
              schema:
                contentEncoding: base64

And the http request would specify:

  • Content-type: image/png
  • Content-encoding: gzip
  • a gzipped, base64-encoded image/png

Is that right?

If I've got this right, then I think you are saying that both the encoding specified in the Content-Encoding header and the encoding specified in the contentEncoding keyword are applied to the request body (or whichever element they pertain to).

If that's right, I'll make a stab at rewording my suggestion to describe this.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mkistler yeah this is the challenge of combining multiple standards that were made in different contexts at different times.

If you go back to MIME, there is a Content-Transfer-Encoding header, which is where values such as base64 from RFC 4648 (or its predecessors) are used. Perhaps we should have used mediaEncoding or something like that in JSON Schema (the naming of that keyword has a slightly convoluted history that partially predates me), but we didn't. I might have only learned about HTTP Content-Encoding while researching this PR, come to think of it.

I believe that what you have for the HTTP request is correct. At this stage with the modifications y'all have made recently, I'm a little unclear on where we ended up on type being absent vs "type": "string", but what you have (with type absent) is what I would have written. And is compatible with the latest JSON Schema draft which notes that an application MAY apply a JSON Schema to other media types, including binary media types, and notes that many keywords have no sensible meaning with binary types.

darrelmiller marked this conversation as resolved.
Show resolved Hide resolved

Examples:

Content transferred in binary (octet-stream) SHOULD omit `schema`, as no JSON Schema type is suitable:
Content transferred in binary (octet-stream) MAY omit `schema`:

```yaml
# a PNG image as a binary file:
Expand All @@ -1461,9 +1458,12 @@ content:
image/png:
schema:
type: string
contentMediaType: image/png
contentEncoding: base64
```

Note that the `Content-Type` remains `image/png`, describing the semantics of the payload. The JSON Schema `type` and `contentEncoding` fields explain that the payload is transferred as text. The JSON Schema `contentMediaType` is technically redundant, but can be used by JSON Schema tools that may not be aware of the OpenAPI context.

These examples apply to either input payloads of file uploads or response payloads.

A `requestBody` for submitting a file in a `POST` operation may look like the following example:
Expand Down Expand Up @@ -1496,10 +1496,11 @@ requestBody:
# The property name 'file' will be used for all files.
file:
type: array
items:
contentMediaType: application/octet-stream
items: {}
```

As seen in the section on `multipart/form-data` below, the empty schema for `items` indicates a media type of `application/octet-stream`.

##### Support for x-www-form-urlencoded Request Bodies

To submit content using form url encoding via [RFC1866](https://tools.ietf.org/html/rfc1866), the following
Expand Down Expand Up @@ -1536,9 +1537,8 @@ When passing in `multipart` types, boundaries MAY be used to separate sections o
* If the property is a primitive, or an array of primitive values, the default Content-Type is `text/plain`
* If the property is complex, or an array of complex values, the default Content-Type is `application/json`
* If the property is a `type: string` with a `contentEncoding`, the default Content-Type is `application/octet-stream`
* If the JSON Schema keyword `contentMediaType` is used and no Encoding Object is present, then the Content-Type is that which is specified by `contentMediaType`, however if an Encoding Object is present, then `contentMediaType` SHALL be ignored

As with non-multipart request or response bodies, when using `contentMediaType` to specify a binary Content-Type without also using `contentEncoding`, the JSON Schema `type` keyword is omitted.
Per the JSON Schema specification, `contentMediaType` without `contentEncoding` present is treated as if `contentEncoding: identity` were present. While useful for embedding text documents such as `text/html` into JSON strings, it is not useful for a `multipart/form-data` part, as it just causes the document to be treated as `text/plain` instead of its actual media type. Use the Encoding Object without `contentMediaType` if no `contentEncoding` is required.

Examples:

Expand All @@ -1557,15 +1557,17 @@ requestBody:
type: object
properties: {}
profileImage:
# Content-Type with contentMediaType is the contentMediaType (image/png here)
# Content-Type for application-level encoded resource is `text/plain`
type: string
contentMediaType: image/png
contentEncoding: base64
children:
# default Content-Type for arrays is based on the `inner` type (text/plain here)
# default Content-Type for arrays is based on the _inner_ type (`text/plain` here)
type: array
items:
type: string
addresses:
# default Content-Type for arrays is based on the `inner` type (object shown, so `application/json` in this example)
# default Content-Type for arrays is based on the _inner_ type (object shown, so `application/json` in this example)
type: array
items:
type: object
Expand All @@ -1581,7 +1583,7 @@ A single encoding definition applied to a single schema property.
##### Fixed Fields
Field Name | Type | Description
---|:---:|---
<a name="encodingContentType"></a>contentType | `string` | The Content-Type for encoding a specific property. Default value depends on the property type: when `type` is absent and `contentMediaType` is present - the value of `contentMediaType`; when both `type` and `contentMediaType` are absent - `application/octet-stream`; for `string` with a `contentEncoding` - `application/octet-string`; for other primitive types – `text/plain`; for `object` - `application/json`; for `array` – the default is defined based on the inner type. The value can be a specific media type (e.g. `application/json`), a wildcard media type (e.g. `image/*`), or a comma-separated list of the two types.
<a name="encodingContentType"></a>contentType | `string` | The Content-Type for encoding a specific property. Default value depends on the property type: when `type` is absent - `application/octet-stream`; for primitive types - `text/plain`; for `object` - `application/json`; when `type` is `string` and `contentEncoding` is present, the default Content-Type is `text/plain`, and the media type of the encoded resource is specified in `contentMediaType`; for `array` – the default is defined based on the inner type. The value can be a specific media type (e.g. `application/json`), a wildcard media type (e.g. `image/*`), or a comma-separated list of the two types.
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought that we agreed in the 1/28 TSC meeting that when type is string the default Content-Type should be application/octet-stream.

darrelmiller marked this conversation as resolved.
Show resolved Hide resolved
<a name="encodingHeaders"></a>headers | Map[`string`, [Header Object](#headerObject) \| [Reference Object](#referenceObject)] | A map allowing additional information to be provided as headers, for example `Content-Disposition`. `Content-Type` is described separately and SHALL be ignored in this section. This property SHALL be ignored if the request body media type is not a `multipart`.
<a name="encodingStyle"></a>style | `string` | Describes how a specific property value will be serialized depending on its type. See [Parameter Object](#parameterObject) for details on the [`style`](#parameterStyle) property. The behavior follows the same values as `query` parameters, including default values. This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded` or `multipart/form-data`. If a value is explicitly defined, then the value of [`contentType`](#encodingContentType) (implicit or explicit) SHALL be ignored.
<a name="encodingExplode"></a>explode | `boolean` | When this is true, property values of type `array` or `object` generate separate parameters for each value of the array, or key-value-pair of the map. For other types of properties this property has no effect. When [`style`](#encodingStyle) is `form`, the default value is `true`. For all other styles, the default value is `false`. This property SHALL be ignored if the request body media type is not `application/x-www-form-urlencoded` or `multipart/form-data`. If a value is explicitly defined, then the value of [`contentType`](#encodingContentType) (implicit or explicit) SHALL be ignored.
Expand All @@ -1594,7 +1596,7 @@ This object MAY be extended with [Specification Extensions](#specificationExtens
```yaml
requestBody:
content:
multipart/mixed:
multipart/form-data:
schema:
type: object
properties:
Expand All @@ -1611,9 +1613,7 @@ requestBody:
description: metadata in XML format
type: object
properties: {}
profileImage:
type: string
contentMediaType: image/jpeg
profileImage: {}
encoding:
historyMetadata:
# require XML Content-Type in utf-8 encoding
Expand Down