Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Generate jsonschema from pydantic v2 #159

Open
wants to merge 19 commits into
base: main
Choose a base branch
from

Conversation

joellabes
Copy link
Collaborator

Picks up where #150 and #157 left off. The purpose of this PR is to add Pydantic representations of dbt's YAML files, which can then be used to automatically generate JSON Schema files.

This was instigated because the Pydantic representations can be used as inputs for AI-powered codegen experiences, but in doing so we've also seen that it's easier/less confusing to write Pydantic models than straight JSON Schema because it's less nested and you can achieve inheritance more simply. (As an example, consider the representations of unit testing - a complex if then else block in JSON Schema, but three simpler classes and a union in Pydantic.)

As far as I can tell, the validation is as good as the original versions. Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did you mean to move this to src/generate.py?

These schemas are generated from [pydantic models](https://docs.pydantic.dev/latest/concepts/json_schema/). To make updates, the process is as follows:

1. Create a virtual environment and install the dependencies: `pip install -r requirements.txt`
2. Make changes to the corresponding pydantic models in `src/latest.py`

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you meant to reference a directory, not a file

Suggested change
2. Make changes to the corresponding pydantic models in `src/latest.py`
2. Make changes to the corresponding pydantic models in `src/latest`

@@ -0,0 +1 @@
datamodel-code-generator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This shouldn't be necessary, right? This package was only necessary to initially transform json-schema to Pydantic, right?

@DevonFulcher
Copy link

Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

How concerning is this? Is this something we should dig deeper into?

schema_: Optional[str] = Field(None, alias="schema")


class SemanticModel(BaseModel):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

agg_time_dimension: Optional[str] = Field(
None, pattern="(?!.*__).*^[a-z][a-z0-9_]*[a-z0-9]$"
)
create_metric: Optional[bool] = True
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

extra="forbid",
)
time_granularity: TimeGranularity
validity_params: Optional[ValidityParams] = ""
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This typing doesn't seem correct? Empty string is not None | ValidityParams

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah this is to prevent the autocomplete from outputting null. An alternative is to instead specify the default in the json_schema_extra of the Field, which would mean this default could stay as None. Is the difference semantically important to Instructor?

Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't know if there are semantic differences, but it causes typing errors:
Screenshot 2024-09-06 at 10 04 05 AM

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ahh got it, I don't have whatever extension that is so vs code didn't flag it to me. In that case if we keep this we'll have to put everything in the json_schema_extra space instead which is all good

type_params: DimensionTypeParams


class Dimension(RootModel[Union[CategoricalDimension, TimeDimension]]):
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@joellabes
Copy link
Collaborator Author

Some autocomplete offerings are a bit worse for some reason, even though they get rendered down to basically the same files. I was sometimes seeing [Object object] being output in VS Code, which I think is a bug in the YAML extension.

How concerning is this? Is this something we should dig deeper into?

Yeah holding off on moving this forward, pending feedback on the loom i made for drew/marco

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants