Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix forward type reference in Pydantic schemas #1171

Closed

Conversation

eli-bl
Copy link
Collaborator

@eli-bl eli-bl commented Nov 27, 2024

This small change addresses an issue that, unfortunately, I don't have a good test case to demonstrate. I've only been able to reproduce it with one specific API spec file... and only in my fork. Ordinarily I wouldn't submit an upstream change due to something that's only ever happened in my fork. However, in this case:

  1. I never changed any code in openapi_schema_pydantic/ in this fork.
  2. The code path that is raising the error also has no changes. It's this line in schemas.py, where it's trying to construct a Parameter instance. The error message ("Parameter is not fully defined") indicates that the problem is not the arguments to the Parameter initializer—it's the Parameter class itself, which Pydantic considers to be invalid. The message also indicates that the "not fully defined" part is a forward reference to the Header class.
  3. The forward reference (in encoding.py) existed in order to avoid an import cycle between Parameter, Encoding, and Header. That should be fine, and was working fine till now with exactly the same code in these Pydantic models. However...
  4. Per the Pydantic docs, the logic for lazily resolving forward references is extremely complicated, and in some hard-to-define cases it can fail. In that case they advise doing what I've done here: calling model_rebuild() on the class that's getting the error, after the previously-undefined class has been defined.

Because of all that, my theory is that there is some extremely subtle interaction between my other changes in the fork which, under some very specific conditions related to the particular spec I was testing with, maybe led to modules being imported in a different order... or something like that. This apparently had no effect on anything else, but made a difference in Pydantic's lazy-resolve mechanism. I can't characterize it any better than that, and I don't know how to write a test for it (although this fix does solve my use case). But I do believe this workaround is valid according to the docs and doesn't cause problems under any other circumstances. And I suspect that the circumstances under which it was working were brittle enough that some other completely unrelated change could've eventually caused it to manifest in an equally confusing way, so you could see this as a fix-in-advance for that.

(The problem is not that header.py hadn't actually been imported before that line in schemas.py executed. I verified 100% for sure that it had been. It's something more subtle than that.)

@eli-bl eli-bl requested a review from dbanty November 27, 2024 01:36
@eli-bl eli-bl marked this pull request as ready for review November 27, 2024 01:39
@texhnolyze
Copy link

This error happened to me after upgrading to version 0.21.7 and still persisted with other 0.21.* versions I tested. Even with 0.21.5, which I was previously using, I got the issue.
I also tested with diff different JSON OpenAPI specs, to ensure this is not just an issue with my personal spec.
E.g. this also happens with the default pet store example from: https://editor.swagger.io/
for me.

This led me to the assumption that it must be some change/regression in pydantic itself.
Manually downgrading to pydantic==2.10.1 in the pipx venv fixes this problem for me.

I took a look at the pydantic diff: pydantic/pydantic@v2.10.1...v2.10.2, but without having a deeper understanding of the pydantic code base I can't really pinpoint what changes might lead to this issue.
There also seem to be a few fixes still outstanding: pydantic/pydantic#10910

Manually adding your changes also fixes the issue, but I guess it should be clarified if there should be a fix in pydantic instead and potentially the pinning of pydantic==2.10.1 in the meantime.

@dedehas
Copy link

dedehas commented Nov 27, 2024

Can confirm, was also struggling with this.
PydanticUserError: Parameter is not fully defined; you should
define Header, then call Parameter.model_rebuild().
I believe it happens when the parameters include references to components.
Changing to pydantic 2.10.1 fixed the issue

@eli-bl
Copy link
Collaborator Author

eli-bl commented Dec 2, 2024

It sounds likely that pydantic's behavior did change with that patch release, but I'm not sure that that's the whole story. The pydantic upgrade has been present for a couple of months; my fork includes that change, and I've been working on my fork a lot during that time, and testing with specs that do include a reference from parameters to components. So it may be that there's some other subtle factor.

But even if it's only pydantic >2.10.1 that has the problem in these cases, I have to assume, based on the docs I linked to, that this was already a known issue in some cases: they specifically tell you that if you have a circular reference and it fails to resolve automatically, the workaround is to call model_resolve(). There are online discussions about that going back several years. So I would consider this change to be an appropriate defensive measure regardless of the version.

@Viicos
Copy link
Contributor

Viicos commented Dec 4, 2024

Hi, Pydantic maintainer here.

In 2.10, we did a complete refactor of the forward annotations evaluation logic to be more correct: it supports more edge cases, but also removes support for invalid use cases. This created some churn in libraries having invalid use cases implemented, but fixing it will result in safer code and better consistency.

Regarding the openapi-python-client, there's a couple places where things can be fixed. Taking the simplest example:

if TYPE_CHECKING: # pragma: no cover
from .header import Header
else:
Header = "Header"
class Encoding(BaseModel):
"""A single encoding definition applied to a single schema property.
References:
- https://swagger.io/docs/specification/describing-request-body/
- https://github.com/OAI/OpenAPI-Specification/blob/main/versions/3.0.3.md#encodingObject
"""
contentType: Optional[str] = None
headers: Optional[dict[str, Union[Header, Reference]]] = None

Here, Header is imported in an if TYPE_CHECKING: block to avoid a circular import. At runtime, TYPE_CHECKING is False, so Pydantic has no way of knowing what it represents (and only sees the "Header" string, what we call a forward reference). To evaluate forward references, Pydantic provides what we call a namespace: a mapping of strings to objects. When Optional[dict[str, Union["Header", Reference]]] is encountered, we look into the namespace for "Header" to replace it with the actual type.

Before 2.10, this used to somehow work because our namespace management was poorly designed, and symbols that shouldn't resolve (like in the example above, where Header is not available at runtime in the module) were still present in the provided namespace (in most cases, this was when building Encoding while in the process of building another model, or when subclassing). In rare cases, this messy namespace management could be dangerous, as you could have different types of the same name in different modules and Pydantic could use the wrong one 1.

To fix the issue, you can either:

  • Fix the circular import. This often requires a complete refactor of the module structure and is sometimes even not possible to do (and unless you flatten everything by defining models in a single module, this seems to be your case).
  • Call model_rebuild() in the right place, once everything is imported. This is the path of least resistance but you have to think about adding these calls whenever you define a model using an unknown forward annotation.

I'm working on a PR to suggest changes following the second option. @eli-bl, do note that benchling#223 is not a complete fix. You might want to apply the same changes from my PR on the fork.

Footnotes

  1. For instance, see https://github.com/langchain-ai/langchain-google/issues/610#issuecomment-2493831993.

@eli-bl
Copy link
Collaborator Author

eli-bl commented Dec 7, 2024

@Viicos, thanks for all of that context, and for submitting the alternate PR! I'll close this one and I'll also update the fix on our fork as per your advice.

@eli-bl eli-bl closed this Dec 7, 2024
@eli-bl eli-bl deleted the pydantic-forward-ref-fix branch December 7, 2024 01:07
github-merge-queue bot pushed a commit that referenced this pull request Dec 24, 2024
This is the alternative approach I mentioned in
#1171 (comment).

Instead of trying to rebuild the models in their respective modules
(which requires weird patterns, such as unused imports or importing
after the model is defined), we set `defer_build` to `True` for every
model where we know a forward reference will fail to resolve (so that we
don't try to build a model if we know it will fail).

I added comments each time to justify the use of `defer_build`, but
unfortunately this isn't always straightforward (e.g. sometimes you
makes use of a model as annotation which itself has `defer_build` set;
in this case we also want to defer build. Another case is when making
use of the `Callback` type alias; it isn't directly visible but it uses
an unresolvable forward reference).

Ultimately, in the module's `__init__.py`, we call `model_rebuild` on
all the necessary models. I know this isn't ideal as well, as you need
to manually check for every exported model here if the build was
successful.

This library is a clear example that inter-dependent types across
different modules is challenging, and Pydantic does not make it easy. We
are trying to think about ways to simplify the process.

Note that on top of fixing things for Pydantic 2.10, this also ensures
every model is successfully built when the `openapi_schema_pydantic`
module is imported. Currently on `main` (with Pydantic 2.9.2), some
models such as `Components` are not built. While this can still work in
some cases, it is advised not to do so (when `Components` is going to be
instantiated, Pydantic will implicitly try to rebuild it if it wasn't
already. However, we use the namespace where the instantiation call
happened to rebuilt it, so depending on _where_ you first instantiate
the model, this can lead to a failed model rebuild and thus a runtime
exception).

---

A note on `model_rebuild`: you can either provide an explicit namespace:

```python
PathItem.model_rebuild(_types_namespace={Operation: "Operation", "Header": Header})
```

Or let `model_rebuild` use the namespace where it was called (in our
case, all the imports are available, so it works).

---------

Co-authored-by: Dylan Anthony <dbanty@users.noreply.github.com>
Co-authored-by: Dylan Anthony <43723790+dbanty@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants