Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generated JSON schema should contain $schema properties #1478

Closed
brot opened this issue May 5, 2020 · 17 comments
Closed

Generated JSON schema should contain $schema properties #1478

brot opened this issue May 5, 2020 · 17 comments
Labels
feature request schema Related to JSON Schema

Comments

@brot
Copy link

brot commented May 5, 2020

Bug

Output of python -c "import pydantic.utils; print(pydantic.utils.version_info())":

             pydantic version: 1.5.1
            pydantic compiled: True
                 install path: /home/bernd/.local/share/virtualenvs/proj1/lib/python3.7/site-packages/pydantic
               python version: 3.7.6 (default, Apr 16 2020, 08:58:32)  [GCC 7.5.0]
                     platform: Linux-5.3.0-51-generic-x86_64-with-debian-buster-sid
     optional deps. installed: ['typing-extensions']
import pydantic

class MainModel(pydantic.BaseModel):
    """
    This is the description of the main model
    """
    snap: int = pydantic.Field(
        42,
        title='The Snap',
        description='this is the value of snap',
        gt=30,
        lt=50,
    )

    class Config:
        title = 'Main'


print(MainModel.schema_json(indent=2))

# output:
# {
#   "title": "Main",
#   "description": "This is the description of the main model",
#   "type": "object",
#   "properties": {
#     "snap": {
#       "title": "The Snap",
#       "description": "this is the value of snap",
#       "default": 42,
#       "exclusiveMinimum": 30,
#       "exclusiveMaximum": 50,
#       "type": "integer"
#     }
#   }
# }

But I miss the $schema property and therefore it's not clear which version of JSON schema pydantic is generating. The documentation refers to the latest version, but this version could change over time so it's not clear to me.

Also the specification says:

The "$schema" keyword SHOULD be used in a resource root schema. It MUST NOT appear in resource subschemas.

The documentation of the jsonschema libarary says

First, if the schema has a $schema property containing a known meta-schema 1 then the proper validator will be used. The specification recommends that all schemas contain $schema properties for this reason. If no $schema property is found, the default validator class is the latest released draft.

But this means that you have to specify the validator class (draft version) yourself or the jsonschema library uses their latest known draft version which is currently draft-07. But the latest version of the specification is draft2019-09.

So it would be great to add the used JSON schema draft/version ($schema property) when generating the JSON schema to help everyone in understand which version of the JSON schema specification the schema refers to.
As far as I know the currently available versions are

and I'm not sure which version pydantic is currently using

@brot brot added the bug V1 Bug related to Pydantic V1.X label May 5, 2020
@samuelcolvin samuelcolvin added feature request and removed bug V1 Bug related to Pydantic V1.X labels May 18, 2020
@samuelcolvin
Copy link
Member

Makes sense to me.

@tiangolo what do think?

@tiangolo
Copy link
Member

Yeah, I think it makes sense.


Some extra comments and info:

The current implementation is based on draft-07.

In OpenAPI, the $schema is not added/used. But that would probably not be a problem as the $schema would be added only to the JSON Schema generated for models and not fields. And for example, FastAPI uses the tools to generate the JSON Schema only for fields, not for models (I'm pretty sure 🤔 ). So the extra $schema would probably not affect those other use cases.

Some extra notes:

The currently generated schemas are also compatible with 2019-09. 🤓 🎉

Here's the migration guide: https://json-schema.org/draft/2019-09/release-notes.html

The main changes, e.g. making format more permissive by default and optionally enforced by implementations, doesn't really affect here, as Pydantic is one of those implementations that adds extra validation for some fields adding format, in some cases with valid extensions.

Other things to note in version 2019-09: there's a new format keyword for UUID, and Pydantic already adds the format uuid for those fields, so, it implemented the new version before existing 😅 (that is also standard in previous versions of JSON Schema as the format can be extended).

There is now an official format duration with ISO 8601. That is already implemented with timedeltas, but the current format in Pydantic is time-delta. The format name could be updated to duration to support the new name in 2019-09.

The last change that affects here is that definitions was renamed to $defs, although the previous definitions is still supported for backwards compatibility. So, still valid, although could be updated.

Another note is that OpenAPI version 3.1.0 (still under development) is based on JSON Schema 2019-09. That means that once that is released, all the JSON Schemas that can be generated with Pydantic will also be valid with OpenAPI.

The main 2 specific points that currently differ are:

  • Tuples are supported by Pydantic and JSON Schema: Tuple[SubType1, SubType2, SubType3], but not yet by OpenAPI, they will be supported in OpenAPI 3.1.0.
  • exclusiveMaximum and exclusiveMinimum in all recent JSON Schemas are a number (as implemented in Pydantic) and in OpenAPI are a boolean (based on JSON Schema draft-00). That will be fully compatible with OpenAPI 3.1.0. Note that minimum and maximum don't have any conflict and work correctly in Pydantic, JSON Schema, and OpenAPI.
  • Other differences like OpenAPI currently has an example and JSON Schema has examples, and in OpenAPI 3.1.0 it will be all just examples, doesn't really affect Pydantic, as those are added by users in their code anyway (e.g. here in the FastAPI docs: https://fastapi.tiangolo.com/tutorial/schema-extra-example/ , that includes these technical details).

@kshitijc
Copy link

The latest version in the jsonschema library is now draft2020-12 which causes the validation against the Pydantic models (which are based on draft-07) to break. Adding in the $schema property would have prevented this.
Are there any plans to add this in anytime soon?

@tobinus
Copy link

tobinus commented Oct 1, 2021

As a work-around in the meantime, it is easy to add $schema youself if you also do the task of serializing to JSON yourself:

import json

# Assuming MainModel is a subclass of pydantic.BaseModel
schema_obj = MainModel.schema()
schema_obj["$schema"] = "http://json-schema.org/draft-07/schema#"
print(json.dumps(schema_obj, indent=2))

@kevindixon
Copy link

I guess an alternative:

class MyModel(BaseModel):
    ...
    class Config:
        schema_extra = {
            '$schema': 'http://json-schema.org/draft-07/schema#'
        }

@hoffa
Copy link

hoffa commented Nov 25, 2022

Any updates to this?

Why doesn’t the generated schema specify which standard it adheres to?

@JonathanPlasse
Copy link

Would a PR be accepted to add draft-07 to the JSON schema generation?

@samuelcolvin
Copy link
Member

We won't make any changes to JSON Schema in v1.10 now.

On V2, we're targeting JSON Schema 2020-12.

No idea how this maps to $schema, I'll need to dig further.

See #5029.

Cc @dmontagu.

@dmontagu
Copy link
Contributor

dmontagu commented Feb 18, 2023

Based on https://json-schema.org/understanding-json-schema/reference/schema.html#id4 I'm assuming we'll want to add "$schema": "https://json-schema.org/draft/2020-12/schema" at the top level. I'll bring this up for discussion in #5029, not sure if we want to always include it (I don't have an opinion), but it would be very easy to expose it by overriding GenerateJsonSchema at least.

@samuelcolvin
Copy link
Member

Agreed, also easier to see and remove/change it than remember to add it.

Therefore #5029 will close this.

@JonathanPlasse
Copy link

Will it always be 2020-12 or will the oldest compatible draft be used?

@samuelcolvin
Copy link
Member

For now it'll always be 2020-12, if someone wants to try and write a wrapper that infers the oldest compatible draft, happy to review a PR.

But honestly, that sounds like a library in itself, and doesn't need to be included in pydantic.

@dmontagu
Copy link
Contributor

@JonathanPlasse is there a use case you have in mind for inferring the oldest compatible draft?

I think @samuelcolvin is right that it's probably not worth including in pydantic proper, but I'm curious about if there is a use case where this would be particularly beneficial

@JonathanPlasse
Copy link

Use the lowest possible schema draft needed, preferably Draft v4, to ensure interoperability with as many supported editors, IDEs and parsers as possible.

Schemastore.org has this recommendation in its documentation.

@samuelcolvin
Copy link
Member

That's fine for them, but their schemas are mostly hand written, so that recommendation is reasonable actionable.

@dmontagu
Copy link
Contributor

I just merged #5029 which adds the JSON schema generation approach we'll use in v2.

As part of merging that, I did add a property to the GenerateJsonSchema class holding the dialect:

schema_dialect = 'https://json-schema.org/draft/2020-12/schema'

but I did not include it in the generated schemas automatically.

Considering the language in the specification says "SHOULD" not "MUST" include, I wasn't sure it was worth "polluting" the generated schemas (and tests) with that draft value everywhere by default.

For what it's worth, I did explicitly include a test showing how you can easily override the schema generation to include the draft version:

def test_override_generate_json_schema():
class MyGenerateJsonSchema(GenerateJsonSchema):
def generate(self, schema):
json_schema = super().generate(schema)
json_schema['$schema'] = self.schema_dialect
return json_schema
class MyBaseModel(BaseModel):
@classmethod
def model_json_schema(
cls,
by_alias: bool = True,
ref_template: str = DEFAULT_REF_TEMPLATE,
schema_generator: Type[GenerateJsonSchema] = MyGenerateJsonSchema,
) -> Dict[str, Any]:
return super().model_json_schema(by_alias, ref_template, schema_generator)
class MyModel(MyBaseModel):
x: int
assert MyModel.model_json_schema() == {
'$schema': 'https://json-schema.org/draft/2020-12/schema',
'properties': {'x': {'title': 'X', 'type': 'integer'}},
'required': ['x'],
'title': 'MyModel',
'type': 'object',
}

It would also be straightforward to actually enable this by default, by uncommenting the last of the following lines:

# For now, we will not set the $schema key. However, if desired, this can be easily added by overriding
# this method and adding the following line after a call to super().generate(schema):
# json_schema['$schema'] = self.schema_dialect

If anyone feels strongly that the current behavior on the main branch (shown above) merits changes, please comment here; otherwise I'm inclined to leave it as is.

@Kludex Kludex closed this as completed Apr 25, 2023
@Kludex Kludex added this to the Version 2 Issues milestone Apr 25, 2023
@dbarnett
Copy link

If anyone feels strongly that the current behavior on the main branch (shown above) merits changes, please comment here

I'm confused, what was the motivation for not enabling $schema output by default, or at least providing a one-liner option to model_json_schema to have it included? Omitting the $schema just leads to breakages (#1478 (comment)) and AFAICT there's no possible advantage to leaving it blank vs. picking a minimum known schema that supports all features pydantic relies on, whether that's draft-07 or something more recent.

For other affected users suffering in the meantime, here's the best workaround I came up with, a slight variation on #1478 (comment) using json_schema_extra:

from pydantic.json_schema import GenerateJsonSchema
class MyConfig(BaseModel):
    model_config = ConfigDict(
        json_schema_extra={
            '$schema': GenerateJsonSchema.schema_dialect
        })

still a little strange though that I need to be manually reaching in to feed pydantic its own metadata when it already knows which schema_generator it's defaulting to and what schema_dialect that declares.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request schema Related to JSON Schema
Projects
None yet
Development

No branches or pull requests