Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add serialize_as_any runtime flag support #1194

Merged
merged 17 commits into from
Mar 21, 2024
Merged

Add serialize_as_any runtime flag support #1194

merged 17 commits into from
Mar 21, 2024

Conversation

sydney-runkle
Copy link
Member

@sydney-runkle sydney-runkle commented Feb 15, 2024

Contributing to pydantic/pydantic#6423

TODO:

  • Add support for dataclasses
  • Add support for TypedDict
  • Add much more comprehensive testing
  • Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)
  • Ask follow up questions about field_name = None
  • Resolve questions about extra usage, why we use mut in model.rs

@sydney-runkle sydney-runkle changed the title Add serialize_as_any config flag + runtime support WIP: Add serialize_as_any config flag + runtime support Feb 15, 2024
Copy link

codspeed-hq bot commented Feb 15, 2024

CodSpeed Performance Report

Merging #1194 will improve performances by 43.79%

Comparing ser_any_support (ffbbd34) with main (71d54a2)

Summary

⚡ 6 improvements
✅ 142 untouched benchmarks

🆕 4 new benchmarks

Benchmarks breakdown

Benchmark main ser_any_support Change
test_complete_core_json 2.6 ms 2.1 ms +21.47%
test_core_json_fs 733.4 µs 538.9 µs +36.11%
test_dict_of_ints_core_json 13.3 ms 10.1 ms +32.41%
🆕 test_enum_int_core N/A 20.4 µs N/A
🆕 test_enum_int_python N/A 39.1 µs N/A
🆕 test_enum_str_core N/A 21.1 µs N/A
🆕 test_enum_str_python N/A 38.6 µs N/A
test_list_of_ints_core_json 5.1 ms 3.9 ms +31.22%
test_set_of_ints_core_json 5.9 ms 4.7 ms +23.71%
test_set_of_ints_core_json_duplicates 3.6 ms 2.5 ms +43.79%

Comment on lines +317 to +331

// If there is no model, use duck typing ser logic for TypedDict
// If there is a model, skip this step, as BaseModel and dataclass duck typing
// is handled in their respective serializers
if extra.model.is_none() {
let duck_typing_ser_mode = extra.duck_typing_ser_mode.next_mode();
let td_extra = Extra {
model,
duck_typing_ser_mode,
..*extra
};
if td_extra.duck_typing_ser_mode == DuckTypingSerMode::Inferred {
return infer_to_python(value, include, exclude, &td_extra);
}
}
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not super happy with having this logic here, but this seems to be a consequence of not having a separate TypedDictSerializer. I think we could revisit that in a different PR 👍

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What happens if we remove from basemodel and dataclass serializers and we instead always just apply the duck typing here ? e.g. remove the extra.model.is_none() check?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@sydney-runkle I'm also curious about this

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dmontagu I think we found that the model & dataclass serializers have some coupling here where they already transform the input into __dict__ and other subfields. Probably we could restructure both to improve performance and reduce coupling.

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like there's a lot of complexity here which might get a bit messy.

e.g. what about list[int] - if serialize_as_any is set, can I pass a list[str] and expect it to serialize without warnings?

@davidhewitt
Copy link
Contributor

Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)

Did you remove that from this PR? Sounds like it's probably orthogonal and might be irrelevant with the recent changes samuel and I made.

Ask follow up questions about field_name = None

? :)

Resolve questions about extra usage, why we use mut in model.rs

If you mean for the extra, IIRC the mut is a temporary fudge to make it possible to use that extra with the recursion guard. We should be refactoring the serialisation state.

sydney-runkle and others added 2 commits March 12, 2024 13:15
@sydney-runkle
Copy link
Member Author

Resolutions from my conversation with @davidhewitt:

  • For now, we do need this at the model, dataclass, and field level, but notably, the field case is only for TypedDict cases
  • The recursion guard logic from Adrian's PR has been removed, as we think it's no longer necessary with more recent recursion guard changes
  • DH and I think that it could be beneficial to use a more generic type base class for the various model, dataclass, TypedDict serializers in the future, as there's definitely some replication that could be removed
  • The field_name stuff hasn't been changed, though it could be revisited in the future
  • The mut questions I had about extra stuff have also been resolved with my changes

@sydney-runkle sydney-runkle changed the title WIP: Add serialize_as_any config flag + runtime support Add serialize_as_any config flag + runtime support Mar 12, 2024
Copy link
Collaborator

@dmontagu dmontagu left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't really love DuckTypingSerMode/duck_typing_ser_mode given we already have ser_mode and they are orthogonal. But I can't think of a great alternative so maybe it's fine. I was thinking something like SerSchemaOnly or similar, but I guess DuckTypingSerMode is more intuitive.

Otherwise this generally looks good to me, though I think DH had a good question

@dmontagu
Copy link
Collaborator

I see that maybe that was addressed in other discussion with DH. I have reviewed and don't have any objections, I'd feel better if @davidhewitt also approved though.

@davidhewitt
Copy link
Contributor

So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected:

def test_serialize_as_any_list_types() -> None:
    serializer = SchemaSerializer(core_schema.list_schema(core_schema.int_schema()))

    assert serializer.to_python(['a', 'b', 'c'], serialize_as_any=True) == ['a', 'b', 'c']

The annotated code which I'd argue is equivalent doesn't warn:

from pydantic import TypeAdapter, SerializeAsAny

TypeAdapter(SerializeAsAny[list[int]]).dump_python(["a", "b", "c"])

@sydney-runkle
Copy link
Member Author

So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected

I think it makes sense to still warn here in general. SerializeAsAny if I understand correctly is more about how we handle subclasses

The annotated code which I'd argue is equivalent doesn't warn

Hmm, well in this case, we're actually using an any schema, hence the lack of warning. I think we could address this separately in pydantic when we modify the schema for a type wrapped in SerializeAsAny...

@adriangb at one point suggested:

if we make SerializeAsAny generate it's schema as some sort of union of infer and use this static schema it should work for all cases. SerializeAsAny[SecretStr] would use the static schema unless you pass in a subclass of str that has a serializer and SerializeAsAny[SomeModel] would always use the instance's serializer.

@davidhewitt
Copy link
Contributor

I agree that SerializeAsAny is intended to be used for subclasses, but it does allow arbitrary type inference and my point above is that the annotation is still behaving differently to the runtime flag. I think users might find this subtle / surprising. It makes the behaviour something they have to discover empirically rather than easily reason about.

That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.

@adriangb
Copy link
Member

I came here to ask the same question @davidhewitt asked about list[int] and ['a'] (also worth checking ['1']).

Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.

Which does mean maybe we should have a DuckTypeSerialization[T] annotation.

@sydney-runkle
Copy link
Member Author

Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.

Which does mean maybe we should have a DuckTypeSerialization[T] annotation.

@adriangb I think that this was really well said and carefully thought out. @davidhewitt, are you ok with this change? I think it allows us to sidestep the surprising inconsistencies with SerializeAsAny, while also moving forward with these changes!

@davidhewitt
Copy link
Contributor

I can see the purity in this argument but I worry it adds complexity to users by having "duck typed serialization" and "serialize as any" as separate concepts. Especially when the intended use for both of them is to serialize subtypes in full, and neither of them as implemented behave like what I'd expect from "duck typing".

I would much prefer we just lived with this being called serialize_as_any and we worked to unify it with the annotation.

@sydney-runkle
Copy link
Member Author

@davidhewitt,

That makes sense, so perhaps we should proceed with this approach?

That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.

@adriangb
Copy link
Member

@sydney-runkle do you think we could make the annotation behave more like this?

@sydney-runkle
Copy link
Member Author

@adriangb,

I think we would probably have to go the other way, that is, being more lax on the serialize_as_any runtime flag side of things. I've created this issue to track the discrepancies in behavior, and ultimately nudge us towards resolving those differences: pydantic/pydantic#9049.

@sydney-runkle
Copy link
Member Author

Just waiting on #1236 to fix the bug mentioned in the most recent test that I wrote

Copy link
Contributor

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we're ready to ship this and see how it pans out. Definitely some edge cases yet to discuss but it's a good tool to expose for users 👍

@sydney-runkle sydney-runkle changed the title Add serialize_as_any config flag + runtime support Add serialize_as_any runtime flag support Mar 21, 2024
@sydney-runkle sydney-runkle merged commit da16140 into main Mar 21, 2024
27 of 28 checks passed
@sydney-runkle sydney-runkle deleted the ser_any_support branch March 21, 2024 14:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants