Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

incorrect serialization of complex union type? #742

Closed
drawnwren opened this issue Sep 11, 2024 · 5 comments
Closed

incorrect serialization of complex union type? #742

drawnwren opened this issue Sep 11, 2024 · 5 comments

Comments

@drawnwren
Copy link

Describe the bug
The output of the following code serializes to the incorrect union type.

import typing
from datetime import datetime
from uuid import UUID

from dataclasses_avroschema.pydantic.main import AvroBaseModel


class Typing(AvroBaseModel):
    turn_id: int
    conversation_id: UUID
    authored_by: str
    timestamp: datetime


class BeginMessage(AvroBaseModel):
    references_id: int | None
    turn_id: int
    conversation_id: UUID
    authored_by: str
    timestamp: datetime


class MessageForClient(AvroBaseModel):
    inner: typing.Union[
        BeginMessage,
        Typing,
    ]


fake_message = MessageForClient(inner=Typing.fake())
serialized = fake_message.serialize(serialization_type="avro-json")
print(serialized) # prints: b'{"inner": {"BeginMessage": {"turn_id": 530, "conversation_id": "ea645be7-9f1f-4b37-8be4-6e30a5c9b10c", "authored_by": "tvaCmZvGtPJaLYLpDzyF", "references_id": null, "timestamp": 1231931709222}}}'

The results are the same without 'avro-json':

MessageForClient.deserialize(fake_message.serialize())
MessageForClient(inner=BeginMessage(references_id=None, turn_id=7344, conversation_id=UUID('07d91529-c3aa-4e93-91d0-5fe0413a3405'), authored_by='utYSVniqvsjIcjOANbMx', timestamp=datetime.datetime(1970, 8, 28, 13, 16, 25, 709000, tzinfo=datetime.timezone.utc)))

If I change the Typing class to:

class Typing(AvroBaseModel):
    property: str
    other_property: str

Everything works as expected and it outputs the Typing inner. So, it seems like it has something to do with the properties being too close to each other?

@marcosschroh
Copy link
Owner

You are experience the same issue as #584.

It is impossible to daticte, pydantic, any library or programing language to determine which instance should be created from a json payload when it matches all the classes, in this case Typing and BeginMessage.

In this case because you are using AvroBaseModel, pydandic is determining under the hood which instance should be created in this line. Because there is not a clear difference with the typing.Union[BeginMessage, Typing] then the first type is chosen.

@drawnwren
Copy link
Author

drawnwren commented Sep 22, 2024

I don't think the problem has anything to do with json. The problem occurs during the serialization step of the avro-dataclasses library.

self.asdict(standardize_factory=self.standardize_type),

You're naively calling .asdict, but in order to hint to fastavro the correct type for the union, I think you need to use either the tuple notation or the "-type" key outlined here: https://github.com/fastavro/fastavro/blob/dd026c6e584b3ed3e731002e815500f49dd554c2/docs/writer.rst#using-the-tuple-notation-to-specify-which-branch-of-a-union-to-take

@drawnwren
Copy link
Author

I put a sketch of a potential solution in #752

@marcosschroh
Copy link
Owner

Please read the outcome of the issue linked in this one and thanks fot the PR.

Unfortunately, the solution is not as simple as it seems to be. You need to make it work for dataclasses, faust and pydantic v1 and v2, all of them not only with serialization but also deserialization. I have already created a PR that solved dataclasses and potentially faust. Probably this week I wil have the proper fix.

@marcosschroh
Copy link
Owner

Closed by #751

With the latest version the output is the proper one:

b'{"inner": {"Typing": {"turn_id": 627, "conversation_id": "6f11762f-dc36-47b9-8895-ea7c1413f21b", "authored_by": "TaGxTdTDnDrRYZuNDXnJ", "timestamp": 544747194977}}}'

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants