-
Notifications
You must be signed in to change notification settings - Fork 9.1k
v3.2: Sequential/streaming media types (take 2) #4554
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
@ThomasRooney this does not address your ideas around the Encoding Object and sentinel events, mostly to avoid further complexity. I'm happy to discuss that idea further if you want to start a new thread on the discussion (#4171). We are also talking there about how streaming |
@ThomasRooney added more details in this most recent commit from reading that part of the WHATWG spec, including a |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great work on this! I only had time for a partial review — it will take more time to get to the rest of it. Feel free to shoot down suggestions — I'm making them to put options on the table. 🙂
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really like this, and highly appreciate the second iteration.
Structured streaming of data is rapidly growing in popularity (primarily due to the AI API wave), and I think the examples you put in here are going to help a lot of vendors grok and build tooling around these formats.
@hudlow I merged a couple of your suggestions and will sort out the other updates I mentioned soon. |
@hudlow I think I have addressed all of the comments that I did not mark as resolved in this latest commit. I particularly want to thank you for pointing out the awkwardness of the |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I found one minor wording issue that should be easy to fix.
And I left a few comments that may just reveal my lack of understanding, but I thought I'd make them anyway in case they are helpful.
###### Streaming Sequential Media Types | ||
|
||
The `itemSchema` field is provided to support streaming use cases for sequential media types. | ||
Unlike `schema`, which is applied to the complete content (treated as an array as described in the [sequential media types](#sequential-media-types) section), `itemSchema` MUST be applied to each item in the stream independently, which supports processing each item as it is read from the stream. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like there is still something important missing here. Maybe it is just missing in my mental model. I don't think that using schema
prevents processing content as it comes in -- I believe there are streaming JSON parsers that already do this. And I'm not aware (but will be happy to be informed) of a requirement that if the "complete content" does not satisfy the schema
that none of it may be accepted/processed.
So from that perspective (which may be misinformed, and if so I'm happy to be corrected), the purpose of itemSchema
is to signal that, not only is it "possible" to process items as they are read, but it is "expected" that each item is completely independent and should be processed regardless of whether past or future items satisfy the itemSchema
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@hudlow this is why I had that awkward caveat about a streaming JSON Schema processor in here.
Every time someone wants me to take something out, someone else wants me to put it in. idk what to do here. We can't put every caveat in the text is so long as it is.
There are streaming JSON parsers, but streaming validation isn't really a thing (I know of one partial implementation). So you don't have to load the whole content to parse it, and in theory you might be able to validate it, but in practice you need to load the whole thing to validate it.
How much do we really need to explain here in the text?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also some JSON Schema keywords (e.g maxProperties
on a root object) do require that you parse the whole document, although at least in theory you can do those without holding the whole document in memory. Although with uniqueItems
on a root array, you might end up needing to hold it all in memory, or at least hold a hash of each item... but we are getting extremely deep in the weeds here.
@mikekistler it's not clear to me what the problem is with the current wording. The wording doesn't stop someone from implementing streaming JSON Schema validation, it just says that itemSchema
supports processing item-by-item. This is not true of schema
because in general JSON documents are not composed of "items" (unless they are arrays, but... do we really need to call that out? It would take a paragraph or two even to set up the distinction)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikekistler Yeah, I'm also confused. Do you actually object to anything in this paragraph or do you object to something on lines 1662 or 1663?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm not objecting to anything currently here. There is nothing wrong with the current wording except that I think it does not explain the purpose if itemSchema
fully. To say that itemSchema
"supports processing each item as it is read from the stream" is accurate but does not distinguish it from schema
. I think the essential difference that that itemSchema
indicates that items are expected to be processed as they are received and the validity of any item, preceding or following, does not change that.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unlike
schema
, which is applied to the complete content...
Under this proposal, OpenAPI:
- Defines what it means for
schema
to be applied to the whole of the content in a message (represented as a JSON array). - Defines what it means for
itemSchema
to be applied to any individual item in a sequence within a message (represented in JSON without an array wrapper representing the sequence itself).
And does not:
- Define what it means for a
schema
to be applied to a subset of the items in a sequence within a message (represented as a JSON array or any other way). - Define what it means for some part of
schema
to be applied to an individual item (or any subset of the items) in a sequence within a message (represented in JSON—with or without an array wrapper representing the sequence itself—or any other way).
If an implementor wants to unpack schema
and apply some kind of reasoning to it to determine that it knows how to safely apply it or part of it (say, its items
keyword) to a part of a sequence (e.g., an individual item), this proposal doesn't really go out of its way to stop them from doing that but it certainly doesn't suggest it's a good idea or give any hints as to how to do it.
itemSchema
is a way for OpenAPI to give implementors a clear, safe way to apply a schema to individual items in a sequence within a message.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikekistler if adding the word "expected" is what's holding up approval, sure, go ahead and add it where you think it helps. I don't understand the distinction you're making but I don't see any problem with putting that word in somewhere if that's what you're trying to get at. (I'm still not sure TBH).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikekistler coming back to this, I still do not understand your objection. You say:
To say that
itemSchema
"supports processing each item as it is read from the stream" is accurate but does not distinguish it fromschema
.
schema
does not support processing as a stream. That is the entire reason this PR exists at all. If schema
supported this, we wouldn't need a change here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@mikekistler we don't want to require processing each item individually for itemSchema
. I expect that will be common, and is the easiest option. But implementations might choose to process them in small batches. Or even not as a stream (although why they would do that I don't know). The point is that itemSchema
gives implementations options that they do not have with schema
.
The force-push is to resolve conflicts with #4562. The only conflicts involved git not knowing what order to put the added sections in, as both PRs added in the same place. No text within any commit on this PR was changed. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good. 👍
From the TDC call today, @mikekistler and @hudlow have concerns about how much of an expectation for users the OAS sets, which @hudlow notes is a larger topic than just this one PR. They will follow-on with a discussion (#4570) and additional PR as needed. The consensus of the call was that this PR as written does not prevent applications from offering various additional guarantees, it just does not mandate any of them. #4570 will address whether an additional mandate is required here. |
The fundamental approach in this change was proposed by Karen Etheridge. This adds "itemSchema" as a schema to apply to each entry in a sequential media type instance. It also defines how to map sequential media tyes for use with "schema", and explains that "schema" applies to complete content only. JSON-based and text/event-stream media types are included. Co-authored-by: Karen Etheridge <ether@cpan.org>
Co-authored-by: Karen Etheridge <ether@cpan.org>
Co-authored-by: Greg Dennis <gregsdennis@yahoo.com>
Add a few more details.
Co-authored-by: Thomas Rooney <thomas@resiliencycollective.com>
Tighten up the wording a bit to ensure the scope of this technique is immediately clear.
Co-authored-by: Dan Hudlow <dan@hudlow.org>
Co-authored-by: Dan Hudlow <dan@hudlow.org>
Co-authored-by: Dan Hudlow <dan@hudlow.org>
This streamlines the text/event-stream discussion and makes a few other tweaks based on feedback from @hudlow.
When we did not have `itemSchema`, we needed more guidance about improper use of `schema`, but we don't really need it now.
Co-authored-by: Mike Kistler <mikekistler@microsoft.com>
It was wrapped in an array left over from the previous attempt at writing up this feature.
The format registry currently doesn't specify that double works with strings, so pick something that works with strings as of now.
Had to fix the The subsequent force-push just rebases onto v3.2-dev with the latest syntax highlighting included, which ensures the build succeeds without warnings. @mikekistler my apologies but you'll have to hit approve again. |
OK since the change that reset @mikekistler 's approval was trivial (and easily fixed in the unlikely event that I somehow messed it up), I'm going to go ahead and merge this so I can build on it in subsequent PRs. |
@ThomasRooney @disintegrator @gregsdennis @philsturgeon @hudlow here is a new attempt at supporting sequential and/or streaming media types.
The fundamental approach in this change was proposed by @karenetheridge as a comment on the previous attempt at solving this problem.
This adds
itemSchema
as a schema to apply to each entry in a sequential media type instance. It also defines how to map sequential media tyes for use withschema
, and explains thatschema
applies to complete content only.JSON-based and text/event-stream media types are included.