Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support application/json-seq and similar JSON-based sequential formats #3730

Open
handrews opened this issue Apr 20, 2024 Discussed in #2707 · 3 comments
Open

Support application/json-seq and similar JSON-based sequential formats #3730

handrews opened this issue Apr 20, 2024 Discussed in #2707 · 3 comments
Assignees
Labels
enhancement media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Milestone

Comments

@handrews
Copy link
Member

Splitting this out of issue #1576...

...which was originally about binary streaming but later drifted into a discussion of JSON streaming. Several real-world uses for JSON streaming were cited:

The binary streaming case just required a clarification and has been addressed in PR #3729 for 3.0.4, but adding proper support for a new media type will have to go into 3.2.0.

Also Discussed in #2707

Originally posted by Skeeve September 9, 2021
Is there a way to describe a json-lines with OpenAPI?

Besides the fact that there seems to be no mimetype for it yet, I'm wondering if it's possible to describe such a response.

In theory my response could be an array of objects, but I received the question whether or not I could deliver as json-lines, meaning: Just the objects, one per line.

Since I'm using OpenAPI to describe my API I'm puzzled as how to describe this response. I could simply define the response as being of type "string", but this is not very helpful for readers of my Api-spec.

P.S. I already asked at stackoverflow and was pointed to here.

Further thoughts

There are now several formats for JSON streaming

  • RFC 7464 "JavaScript Object Notation (JSON) Text Sequences" (application/json-seq)
    • RFC 8124 "GeoJSON Text Sequences" (application/geo+json-seq)
  • JSON Lines (implementations; issue tracking proposal for application/jsonl media type with recent activity
  • NDJSON (currently uses application/x-ndjson; there might be an effort towards application/ndjson or merging with JSON lines

It should be clear from the lists of implementations, requests from several different folks in both issues and discussions, and the existence of significant derivative specs (GeoJSON is widely used within the geospatial data space) that this is a real use case with many applications.

AFAICT, the distinctions among the three formats are irrelevant to modeling their contents, and only involve the choice of delimiter, the allowability of blank lines/sequences (which are skipped during parsing regardless of the format), and the details of error handling. JSON Lines and NDJSON might actually merge.

From a data modeling perspective, I think we could support these in OAS 3.2 by noting that:

  • They can be modeled as if they were a JSON array
  • Since JSON Schema implementations will not support these formats directly, tools will need to to do one of:
    • convert the data into an actual JSON array prior to schema validation (easy for a finite-length document)
    • individually apply the relevant subschema(s) to each JSON text in the sequence (better for an incrementally processed stream)

Tools that directly integrate and use JSON Schema implementations would need to handle the translation, but that's the main tooling impact. We could write the requirements around any sequential JSON format rather than tying it to any specific media type, since there seem to be multiple more-or-less equivalent approaches (I'm probably missing some).

@handrews handrews added enhancement media and encoding Issues regarding media type support and how to encode data (outside of query/path params) labels Apr 20, 2024
@handrews handrews added this to the v3.2.0 milestone Apr 20, 2024
@handrews handrews self-assigned this Apr 20, 2024
@NickG-NZ
Copy link

Cohere (LLM provider) is another example of an API streaming JSON objects:

https://docs.cohere.com/docs/streaming

@zohairhadi
Copy link

has any progress been made here?

@handrews
Copy link
Member Author

@zohairhadi it's proposed for 3.2. We're currently busy getting 3.0.4 and 3.1.1 out the door. 3.2 is next - there's no planned schedule but I'm hopeful we can ship it by the end of 2024. The proposed change is PR #3735, which is in draft because 3.2 is not yet the active release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement media and encoding Issues regarding media type support and how to encode data (outside of query/path params)
Projects
None yet
Development

No branches or pull requests

3 participants