Add `serialize_as_any` runtime flag support #1194

sydney-runkle · 2024-02-15T18:17:48Z

Contributing to pydantic/pydantic#6423

TODO:

Add support for dataclasses
Add support for TypedDict
Add much more comprehensive testing
Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)
Ask follow up questions about field_name = None
Resolve questions about extra usage, why we use mut in model.rs

…e levels

codspeed-hq · 2024-02-15T18:25:07Z

CodSpeed Performance Report

Merging #1194 will improve performances by 43.79%

_{Comparing ser_any_support (ffbbd34) with main (71d54a2)}

Summary

⚡ 6 improvements
✅ 142 untouched benchmarks

🆕 4 new benchmarks

Benchmarks breakdown

	Benchmark	`main`	`ser_any_support`	Change
⚡	`test_complete_core_json`	2.6 ms	2.1 ms	+21.47%
⚡	`test_core_json_fs`	733.4 µs	538.9 µs	+36.11%
⚡	`test_dict_of_ints_core_json`	13.3 ms	10.1 ms	+32.41%
🆕	`test_enum_int_core`	N/A	20.4 µs	N/A
🆕	`test_enum_int_python`	N/A	39.1 µs	N/A
🆕	`test_enum_str_core`	N/A	21.1 µs	N/A
🆕	`test_enum_str_python`	N/A	38.6 µs	N/A
⚡	`test_list_of_ints_core_json`	5.1 ms	3.9 ms	+31.22%
⚡	`test_set_of_ints_core_json`	5.9 ms	4.7 ms	+23.71%
⚡	`test_set_of_ints_core_json_duplicates`	3.6 ms	2.5 ms	+43.79%

sydney-runkle · 2024-02-23T13:51:15Z

src/serializers/fields.rs

+
+        // If there is no model, use duck typing ser logic for TypedDict
+        // If there is a model, skip this step, as BaseModel and dataclass duck typing
+        // is handled in their respective serializers
+        if extra.model.is_none() {
+            let duck_typing_ser_mode = extra.duck_typing_ser_mode.next_mode();
+            let td_extra = Extra {
+                model,
+                duck_typing_ser_mode,
+                ..*extra
+            };
+            if td_extra.duck_typing_ser_mode == DuckTypingSerMode::Inferred {
+                return infer_to_python(value, include, exclude, &td_extra);
+            }
+        }


I'm not super happy with having this logic here, but this seems to be a consequence of not having a separate TypedDictSerializer. I think we could revisit that in a different PR 👍

What happens if we remove from basemodel and dataclass serializers and we instead always just apply the duck typing here ? e.g. remove the extra.model.is_none() check?

@sydney-runkle I'm also curious about this

@dmontagu I think we found that the model & dataclass serializers have some coupling here where they already transform the input into __dict__ and other subfields. Probably we could restructure both to improve performance and reduce coupling.

davidhewitt

I feel like there's a lot of complexity here which might get a bit messy.

e.g. what about list[int] - if serialize_as_any is set, can I pass a list[str] and expect it to serialize without warnings?

davidhewitt · 2024-02-29T16:49:57Z

Investigate recursion guard that Adrian changed initially, write relevant test, investigate performance consequences - planning to chat with @davidhewitt about this later this week :)

Did you remove that from this PR? Sounds like it's probably orthogonal and might be irrelevant with the recent changes samuel and I made.

Ask follow up questions about field_name = None

? :)

Resolve questions about extra usage, why we use mut in model.rs

If you mean for the extra, IIRC the mut is a temporary fudge to make it possible to use that extra with the recursion guard. We should be refactoring the serialisation state.

temp stash temp stash finalizing messy merge

sydney-runkle · 2024-03-12T19:21:55Z

Resolutions from my conversation with @davidhewitt:

For now, we do need this at the model, dataclass, and field level, but notably, the field case is only for TypedDict cases
The recursion guard logic from Adrian's PR has been removed, as we think it's no longer necessary with more recent recursion guard changes
DH and I think that it could be beneficial to use a more generic type base class for the various model, dataclass, TypedDict serializers in the future, as there's definitely some replication that could be removed
The field_name stuff hasn't been changed, though it could be revisited in the future
The mut questions I had about extra stuff have also been resolved with my changes

dmontagu

I don't really love DuckTypingSerMode/duck_typing_ser_mode given we already have ser_mode and they are orthogonal. But I can't think of a great alternative so maybe it's fine. I was thinking something like SerSchemaOnly or similar, but I guess DuckTypingSerMode is more intuitive.

Otherwise this generally looks good to me, though I think DH had a good question

dmontagu · 2024-03-13T20:50:15Z

I see that maybe that was addressed in other discussion with DH. I have reviewed and don't have any objections, I'd feel better if @davidhewitt also approved though.

davidhewitt · 2024-03-14T11:18:05Z

So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected:

def test_serialize_as_any_list_types() -> None:
    serializer = SchemaSerializer(core_schema.list_schema(core_schema.int_schema()))

    assert serializer.to_python(['a', 'b', 'c'], serialize_as_any=True) == ['a', 'b', 'c']

The annotated code which I'd argue is equivalent doesn't warn:

from pydantic import TypeAdapter, SerializeAsAny

TypeAdapter(SerializeAsAny[list[int]]).dump_python(["a", "b", "c"])

sydney-runkle · 2024-03-14T12:45:54Z

So I'm generally happy with what's going on here. I think there are still cases that I'd like to at least ask about. For example, list[str] vs list[int] serialisation. The following test fails with warnings for passing str where int expected

I think it makes sense to still warn here in general. SerializeAsAny if I understand correctly is more about how we handle subclasses

The annotated code which I'd argue is equivalent doesn't warn

Hmm, well in this case, we're actually using an any schema, hence the lack of warning. I think we could address this separately in pydantic when we modify the schema for a type wrapped in SerializeAsAny...

@adriangb at one point suggested:

if we make SerializeAsAny generate it's schema as some sort of union of infer and use this static schema it should work for all cases. SerializeAsAny[SecretStr] would use the static schema unless you pass in a subclass of str that has a serializer and SerializeAsAny[SomeModel] would always use the instance's serializer.

davidhewitt · 2024-03-14T13:16:48Z

I agree that SerializeAsAny is intended to be used for subclasses, but it does allow arbitrary type inference and my point above is that the annotation is still behaving differently to the runtime flag. I think users might find this subtle / surprising. It makes the behaviour something they have to discover empirically rather than easily reason about.

That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.

adriangb · 2024-03-14T17:11:05Z

I came here to ask the same question @davidhewitt asked about list[int] and ['a'] (also worth checking ['1']).

Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.

Which does mean maybe we should have a DuckTypeSerialization[T] annotation.

tests/serializers/test_serialize_as_any.py

sydney-runkle · 2024-03-15T12:12:23Z

Perhaps we can resolve the confusion by calling this flag / feature duck_type_serialization instead of serialize_as_any? I think logically then it follows that you "ask the thing how it wants to be serialized and if it doesn't know how to fall back to the schema". As opposed to "serialize as any" which means "use the same logic as if this were Any" which means "ask it how to serialize itself and if it doesn't know fall back to inferring how to serialize it from the type at runtime". The difference being that in the duck type case we fall back to the schema and in the serialize as any case we fall back to inferring and never use the schema.

Which does mean maybe we should have a DuckTypeSerialization[T] annotation.

@adriangb I think that this was really well said and carefully thought out. @davidhewitt, are you ok with this change? I think it allows us to sidestep the surprising inconsistencies with SerializeAsAny, while also moving forward with these changes!

davidhewitt · 2024-03-15T15:58:15Z

I can see the purity in this argument but I worry it adds complexity to users by having "duck typed serialization" and "serialize as any" as separate concepts. Especially when the intended use for both of them is to serialize subtypes in full, and neither of them as implemented behave like what I'd expect from "duck typing".

I would much prefer we just lived with this being called serialize_as_any and we worked to unify it with the annotation.

sydney-runkle · 2024-03-17T01:38:00Z

@davidhewitt,

That makes sense, so perhaps we should proceed with this approach?

That said, I'm not entirely sure on the right solution here. I think the point is that I would like to close that gap in the long term. Perfect being the enemy of the good and all that, maybe the approach is to proceed with what we have here, and create a tracking issue with follow ups of where SerializeAsAny doesn't work like serialize_as_any=True and then we can improve inference later.

adriangb · 2024-03-17T02:50:00Z

@sydney-runkle do you think we could make the annotation behave more like this?

sydney-runkle · 2024-03-19T12:32:43Z

@adriangb,

I think we would probably have to go the other way, that is, being more lax on the serialize_as_any runtime flag side of things. I've created this issue to track the discrepancies in behavior, and ultimately nudge us towards resolving those differences: pydantic/pydantic#9049.

sydney-runkle · 2024-03-21T13:54:16Z

Just waiting on #1236 to fix the bug mentioned in the most recent test that I wrote

davidhewitt

I think we're ready to ship this and see how it pans out. Definitely some edge cases yet to discuss but it's a good tool to expose for users 👍

first pass at support for serialize_as_any logic at config and runtim…

95c8bf8

…e levels

sydney-runkle changed the title ~~Add serialize_as_any config flag + runtime support~~ WIP: Add serialize_as_any config flag + runtime support Feb 15, 2024

This was referenced Feb 15, 2024

Support for serialize_as_any runtime setting pydantic/pydantic#8830

Merged

Add runtime flag for duck typed serialization #740

Closed

sydney-runkle added 8 commits February 16, 2024 08:31

partway through some work

80c6138

removing logic supporting config setting

ff79028

fixing a few tests, still stumped by others

9d56e6c

somehow got tests passing??

adf48e0

test fixes for typed dict

dd75143

improving naming, consolidating logic, many pydantic tests now broken?

4bbe219

got all tests passing, for now

343f60f

adding tests

dc584ec

sydney-runkle commented Feb 23, 2024

View reviewed changes

davidhewitt reviewed Feb 29, 2024

View reviewed changes

sydney-runkle and others added 2 commits March 12, 2024 13:15

Merge branch 'main' into ser_any_support

a60019f

Cleaning up messy merge

1280bf4

temp stash temp stash finalizing messy merge

linting

00dd53a

sydney-runkle changed the title ~~WIP: Add serialize_as_any config flag + runtime support~~ Add serialize_as_any config flag + runtime support Mar 12, 2024

sydney-runkle and others added 3 commits March 12, 2024 16:46

Merge branch 'main' into ser_any_support

e2b632c

fixup merge conflicts

8204e8c

formatting

0146710

dmontagu reviewed Mar 13, 2024

View reviewed changes

adriangb reviewed Mar 14, 2024

View reviewed changes

tests/serializers/test_serialize_as_any.py Show resolved Hide resolved

sydney-runkle mentioned this pull request Mar 19, 2024

Reconcile behavioral differences between SerializeAsAny annotation and serialize_as_any runtime flag pydantic/pydantic#9049

Open

add test for unrelated models, document failure

6b67b8f

fix test

ffbbd34

davidhewitt approved these changes Mar 21, 2024

View reviewed changes

sydney-runkle changed the title ~~Add serialize_as_any config flag + runtime support~~ Add serialize_as_any runtime flag support Mar 21, 2024

sydney-runkle merged commit da16140 into main Mar 21, 2024
27 of 28 checks passed

sydney-runkle deleted the ser_any_support branch March 21, 2024 14:34

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add `serialize_as_any` runtime flag support #1194

Add `serialize_as_any` runtime flag support #1194

sydney-runkle commented Feb 15, 2024 •

edited

Loading

codspeed-hq bot commented Feb 15, 2024 •

edited

Loading

sydney-runkle Feb 23, 2024

davidhewitt Feb 29, 2024

dmontagu Mar 13, 2024

davidhewitt Mar 14, 2024

davidhewitt left a comment

davidhewitt commented Feb 29, 2024

sydney-runkle commented Mar 12, 2024

dmontagu left a comment

dmontagu commented Mar 13, 2024

davidhewitt commented Mar 14, 2024

sydney-runkle commented Mar 14, 2024

davidhewitt commented Mar 14, 2024

adriangb commented Mar 14, 2024

sydney-runkle commented Mar 15, 2024

davidhewitt commented Mar 15, 2024

sydney-runkle commented Mar 17, 2024

adriangb commented Mar 17, 2024

sydney-runkle commented Mar 19, 2024

sydney-runkle commented Mar 21, 2024

davidhewitt left a comment

Add serialize_as_any runtime flag support #1194

Add serialize_as_any runtime flag support #1194

Conversation

sydney-runkle commented Feb 15, 2024 • edited Loading

codspeed-hq bot commented Feb 15, 2024 • edited Loading

CodSpeed Performance Report

Merging #1194 will improve performances by 43.79%

Summary

Benchmarks breakdown

sydney-runkle Feb 23, 2024

Choose a reason for hiding this comment

davidhewitt Feb 29, 2024

Choose a reason for hiding this comment

dmontagu Mar 13, 2024

Choose a reason for hiding this comment

davidhewitt Mar 14, 2024

Choose a reason for hiding this comment

davidhewitt left a comment

Choose a reason for hiding this comment

davidhewitt commented Feb 29, 2024

sydney-runkle commented Mar 12, 2024

dmontagu left a comment

Choose a reason for hiding this comment

dmontagu commented Mar 13, 2024

davidhewitt commented Mar 14, 2024

sydney-runkle commented Mar 14, 2024

davidhewitt commented Mar 14, 2024

adriangb commented Mar 14, 2024

sydney-runkle commented Mar 15, 2024

davidhewitt commented Mar 15, 2024

sydney-runkle commented Mar 17, 2024

adriangb commented Mar 17, 2024

sydney-runkle commented Mar 19, 2024

sydney-runkle commented Mar 21, 2024

davidhewitt left a comment

Choose a reason for hiding this comment

Add `serialize_as_any` runtime flag support #1194

Add `serialize_as_any` runtime flag support #1194

sydney-runkle commented Feb 15, 2024 •

edited

Loading

codspeed-hq bot commented Feb 15, 2024 •

edited

Loading