Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Part 5: The limitations of JSON data types #946

Open
m-mohr opened this issue Jul 16, 2024 · 5 comments
Open

Part 5: The limitations of JSON data types #946

m-mohr opened this issue Jul 16, 2024 · 5 comments
Assignees

Comments

@m-mohr
Copy link
Contributor

m-mohr commented Jul 16, 2024

The document says:

To use a schema for data validation, the schema must be converted into a schema representation suitable for validating data in the specific data format. For example, an XML Schema that is a GML application schema or a JSON Schema for GeoJSON or the draft OGC Features and Geometries JSON (JSON-FG).
[...]
The main reasons for using JSON Schema are:
[...]

  • JSON data types (string, number, boolean, array, object, null) are simple and easy to understand;

I'm working on something very similar in a project called fiboa, see https://github.com/fiboa/schema
My observation from this work, mapping JSON Schema to e.g. GeoParquet, is that it's not that easy as it sounds.
Especially the number data type is diffiult.

  • It's not clear how to map number to more granular data types, e.g. uint8, int8, uint64, int64, etc. As JSON has no limitations with regards to the number size, you may always need to choose the biggest data type possible. That's not ideal.
  • There are strongly typed data/languages. Something like type: ["number", "string", "array"] may not be easy or impossible to represent
  • The number type in JSON Schema doesn't include NaN and +/-infinity. How is this meant to map to other languages?
@fmigneault
Copy link

I support this statement.
JSON schema are indeed easier to understand than other representations, but sometimes that implication becomes a hindrance when mapping/piping the data to some other language/implementation, since there is insufficient information to describe the data properly.

I have had similar complex issues when mapping OGC API - Processes inputs (which can be Features) to the various CWL, WPS, JSON schema representations (see details: Weaver - Application Package - Type Correspondence).

Using JSON schema implies a heavy use of format to disambiguate types (eg: type: number, format: double). This has been highlighted on multiple occasions for OGC APIs interoperability (opengeospatial/ogcapi-processes#427, opengeospatial/ogcapi-processes#395, opengeospatial/ogcapi-processes#394, etc.)

Therefore, the standard tackling the "schema" problem to properly describe data should be more explicit and rigorous regarding the recommendations it provides. A few examples of recommendations could be:

  • Listing specific format explicitly, based on OGC https://github.com/opengeospatial/NamingAuthority/, that should be used in certain well-known cases (eg: format: geometry-point are shown in examples, but it is not clear whether that should be interpreted as https://geojson.org/schema/Geometry.json#/oneOf/0 (type: Point), https://github.com/opengeospatial/ogcapi-features/blob/master/core/openapi/schemas/pointGeoJSON.yaml, or some other interpretation)

  • Provide best practices such as reusing more explicit references (ie: prefer a narrowed reference to https://geojson.org/schema/Polygon.json over https://geojson.org/schema/GeoJSON.json if the type is known to be limited to polygons)

  • Make use of contentMediaType, contentSchema and $ref to relevant entities rather than "reinventing" common structures. Which strategies should be used, and why, should provide more justifications.

@pvretano
Copy link
Contributor

pvretano commented Jul 29, 2024

SWG Meeting 29-JUL-2024: We discussed this issues and agreed that JSON datatypes are limited however it was pointed out that one can always use a string type with a bespoke format value to indicate how to interpret the string. So you can encode a uint64 value as a string and then set the format to uint64 to indicate that it should be interpreted as a uint64. Of course this works best if a community of interest agrees on what the format values should be.
The other option, of course, is to use the schema endpoint but negotiate something other than JSON schema that is more suited to the need.

@fmigneault
Copy link

The format is the appropriate solution IMO. A schema is a nice addition on top if applicable for complicated types, but it would ultimately probably need format references within it as well...

The biggest issue at the time is that all OGC APIs seem to be tossing the problem between each other, and never directly addressing this format definition. Therefore, we still lack a well established list of format references that all APIs can refer/interoperate with. There are parts of this list here and there, in drafts and issue comments, but not one clear listing in a centralized naming authority.

@cportele
Copy link
Member

So you can encode a uint64 value as a string and then set the format to uint64 to indicate that it should be interpreted as a uint64.

Why set type to "string"? format is mostly used with strings, but it is not restricted to strings. I would represent this as { "type": "integer", "format": "uint64" }.

This also follows the approach of OpenAPI 3.1 which defines formats "int32" and "int64" for signed integers. See https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.1.0.md#data-types. We should use these as a starting point and extend this with unsigned and other bit size variants.

@cportele
Copy link
Member

Meeting 2024-08-12: As a general rule, if more fine-grained sub-types of the JSON data types are needed, format will be used. In Part 5 we will include the ones from OpenAPI 3.1 and extend them with additional (un)signed integer variants. There should be a format vocabulary. @cportele will create a PR.

There is a higher level governance issue how to add additional formats in OGC API standards beyond those that will be in Part 5. This could be discussed with the other OGC API SWGs in the common meetings at the next Member Meeting.

@cportele cportele self-assigned this Aug 12, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: To be drafted
Development

No branches or pull requests

4 participants