BaseModel documents and change to schema generation #337

evalott100 · 2024-11-21T10:52:01Z

There's been some demand for pydantic BaseModel versions of the documents.

I propose we change event-model document generation to allow for these, in a backwards compatible way.

1: Converting the current jsonschema to pydantic models

Most TypedDict document definitions only add {"additionalProperties": False} to the outputted schema, which is implicit in pydantic models, so most pydantic documents will be identical to the current ones swapping out TypedDict for BaseModel. There are other places where we add more complex logic to the schema.

Run Stop

In run-stop we have the following extra schema:

RUN_STOP_EXTRA_SCHEMA = {
    "patternProperties": {"^([^./]+)$": {"$ref": "#/$defs/DataType"}},
    "additionalProperties": False,
}

Which we can represent in pydantic as:

class RunStop(BaseModel):
    data_type: DataType
    # ... other non `DataType` fields

    class Config:
        extra = 'allow'

    @root_validator(pre=True)
    def validate_additional_fields(cls, values):
        for key, value in values.items():
            if '.' not in key and key not in cls.__fields__:
                try:
                    datatype = DataType.parse_raw(value)
                    setattr(self, key, datatype)
                except ValidationError as err:
                    raise ValueError(f"Extra non-datatype {key} received.") from err
        return values

Event Descriptor

In event-descriptor we have the following extra schema:

EVENT_DESCRIPTOR_EXTRA_SCHEMA = {
    "patternProperties": {"^([^./]+)$": {"$ref": "#/$defs/DataType"}},
    "$defs": {
        "DataType": {
            "title": "DataType",
            "patternProperties": {"^([^./]+)$": {"$ref": "#/$defs/DataType"}},
            "additionalProperties": False,
        },
    },
    "additionalProperties": False,
}

Which we can represent in pydantic the same way as above.

Run Start

The run-start additional schema is substantially more complicated:

RUN_START_EXTRA_SCHEMA = {
    "$defs": {
        "DataType": {
            "patternProperties": {"^([^./]+)$": {"$ref": "#/$defs/DataType"}},
            "additionalProperties": False,
        },
        "Projection": {
            "allOf": [
                {
                    "if": {
                        "allOf": [
                            {"properties": {"location": {"enum": ["configuration"]}}},
                            {"properties": {"type": {"enum": ["linked"]}}},
                        ]
                    },
                    "then": {
                        "required": [
                            "type",
                            "location",
                            "config_index",
                            "config_device",
                            "field",
                            "stream",
                        ]
                    },
                },
                {
                    "if": {
                        "allOf": [
                            {"properties": {"location": {"enum": ["event"]}}},
                            {"properties": {"type": {"enum": ["linked"]}}},
                        ]
                    },
                    "then": {"required": ["type", "location", "field", "stream"]},
                },
                {
                    "if": {
                        "allOf": [
                            {"properties": {"location": {"enum": ["event"]}}},
                            {"properties": {"type": {"enum": ["calculated"]}}},
                        ]
                    },
                    "then": {"required": ["type", "field", "stream", "calculation"]},
                },
                {
                    "if": {"properties": {"type": {"enum": ["static"]}}},
                    "then": {"required": ["type", "value"]},
                },
            ],
        },
    },
    "properties": {
        "hints": {
            "additionalProperties": False,
            "patternProperties": {"^([^.]+)$": {"$ref": "#/$defs/DataType"}},
        },
    },
    "patternProperties": {"^([^./]+)$": {"$ref": "#/$defs/DataType"}},
    "additionalProperties": False,
}

The DataType root_validator can be added to the Hints and RunStart as above.
For Projections the sanest way to adjust what we have currently would be to create a new model for each projection type and then add them as a union in RunStart, this would have the effect of defining a couple of different Projection types in the outputted schema, though it wouldn't be breaking. Alternatively there's the following method:

class Projection(BaseModel):
    type: Literal['linked', 'calculated', 'static']
    location: Optional[Literal['configuration', 'event']] = None
    config_index: Optional[int] = None
    config_device: Optional[str] = None
    field: Optional[str] = None
    stream: Optional[str] = None
    calculation: Optional[str] = None
    value: Optional[str] = None

    @root_validator(pre=True)
    def check_required_fields(cls, values):
        type_ = values.get('type')
        location = values.get('location')

        if type_ == 'linked' and location == 'configuration':
            required_fields = ['type', 'location', 'config_index', 'config_device', 'field', 'stream']
        elif type_ == 'linked' and location == 'event':
            required_fields = ['type', 'location', 'field', 'stream']
        elif type_ == 'calculated' and location == 'event':
            required_fields = ['type', 'field', 'stream', 'calculation']
        elif type_ == 'static':
            required_fields = ['type', 'value']
        else:
            required_fields = []

        for field in required_fields:
            if values.get(field) is None:
                raise ValueError(f'{field} is required for type {type_} and location {location}')

        return values

2: Updating the schema generation

Currently, we generate the jsonschema from the TypedDict definitions with pydantic, and add the EXTRA_SCHEMA dictionaries.

Instead, we'll define the pydantic models, package the schema representation of the root_validators within them and then generate both the jsonschema and the TypedDicts from the pydantic models (statically).

3: Optional fields

Pydantic doesn't allow for fields to be NotRequired, a field which is NotRequired in the TypedDict would have to be None in the pydantic model. For this reason we will forbid fields being Optional having a different meaning to NotRequired.

Fields which are Optional with default None in the BaseModel will be NotRequired[Optional[...]] in the TypedDict.

The text was updated successfully, but these errors were encountered:

evalott100 · 2024-11-21T14:32:11Z

@danielballan @coretl

jacopoabramo · 2024-12-18T17:12:54Z

I'm just stumbling by pure chance on this issue and I just wanted to mention if you would like to also consider using msgspec instead of/together with pydantic.

I'm mentioning it mostly for performance reason: msgspec has quite a strong benchmark in comparison to pydantic (both in terms of speed and library size). I imagine that documents are something that should be produced and consumed as quickly as possible, I'm just throwing this extra possibility hoping to see if it's something worthwhile considering.

evalott100 · 2024-12-19T08:57:03Z

if you would like to also consider using msgspec instead of/together with pydantic.

Thanks very much for the suggestion! The converter I'm using in the draft also supports jsonschema -> msgspec.Struct so if we wanted to implement this then it would be a fairly trivial change.

jacopoabramo · 2024-12-19T09:16:34Z

@evalott100 once the mentioned PR is complete I can probably give a crack at it - I don't want to mix them up. Are you using this tool by any chance?

evalott100 · 2024-12-19T09:19:49Z

@jacopoabramo

Yup, It would just mean making a new

event-model/src/event_model/generate/create_documents.py

Lines 151 to 167 in fda7e64

    
           def generate_typeddict(jsonschema_path: Path, documents_path=DOCUMENTS): 
        
               output_path = documents_path / f"{jsonschema_path.stem}.py" 
        
               datamodel_code_generator.generate( 
        
                   input_=jsonschema_path, 
        
                   input_file_type=datamodel_code_generator.InputFileType.JsonSchema, 
        
                   output=output_path, 
        
                   output_model_type=datamodel_code_generator.DataModelType.TypingTypedDict, 
        
                   use_schema_description=True, 
        
                   use_field_description=True, 
        
                   use_annotated=True, 
        
                   field_constraints=True, 
        
                   wrap_string_literal=True, 
        
               ) 
        
               with output_path.open("r+") as f: 
        
                   content = f.read() 
        
                   f.seek(0, 0) 
        
                   f.write("# ruff: noqa\n" + content)

swapping the output file type and directory.

evalott100 self-assigned this Nov 21, 2024

evalott100 linked a pull request Dec 17, 2024 that will close this issue

Convert from BaseModel to static jsonschema + TypedDict #341

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

BaseModel documents and change to schema generation #337

BaseModel documents and change to schema generation #337

evalott100 commented Nov 21, 2024 •

edited

Loading

evalott100 commented Nov 21, 2024

jacopoabramo commented Dec 18, 2024

evalott100 commented Dec 19, 2024

jacopoabramo commented Dec 19, 2024

evalott100 commented Dec 19, 2024

BaseModel documents and change to schema generation #337

BaseModel documents and change to schema generation #337

Comments

evalott100 commented Nov 21, 2024 • edited Loading

1: Converting the current jsonschema to pydantic models

Run Stop

Event Descriptor

Run Start

2: Updating the schema generation

3: Optional fields

evalott100 commented Nov 21, 2024

jacopoabramo commented Dec 18, 2024

evalott100 commented Dec 19, 2024

jacopoabramo commented Dec 19, 2024

evalott100 commented Dec 19, 2024

evalott100 commented Nov 21, 2024 •

edited

Loading