Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cloudpickle and data classes do not play well together #424

Open
b78 opened this issue Jun 2, 2021 · 3 comments
Open

Cloudpickle and data classes do not play well together #424

b78 opened this issue Jun 2, 2021 · 3 comments

Comments

@b78
Copy link

b78 commented Jun 2, 2021

When restoring a value of a dataclass with cloudpickle, it breaks:

Example

import dataclasses
from dataclasses import dataclass


@dataclass
class Range:
    start: int
    end: int


v = Range(1, 2)
print("defined")
print(dataclasses.fields(v))

import cloudpickle

print("imported cloudpickle")
print(dataclasses.fields(v))

v_dumped = cloudpickle.dumps(v)

print("dumped with cloudpickle")
print(dataclasses.fields(v))

vv = cloudpickle.loads(v_dumped)
print("loaded with cloudpickle")
print(dataclasses.fields(v))

print("restored value")
print(dataclasses.fields(vv))

Leads to

defined
(Field(name='start',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x109634ca0>,default_factory=<dataclasses._MISSING_TYPE object at 0x109634ca0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='end',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x109634ca0>,default_factory=<dataclasses._MISSING_TYPE object at 0x109634ca0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
imported cloudpickle
(Field(name='start',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x109634ca0>,default_factory=<dataclasses._MISSING_TYPE object at 0x109634ca0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='end',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x109634ca0>,default_factory=<dataclasses._MISSING_TYPE object at 0x109634ca0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
dumped with cloudpickle
(Field(name='start',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x109634ca0>,default_factory=<dataclasses._MISSING_TYPE object at 0x109634ca0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD), Field(name='end',type=<class 'int'>,default=<dataclasses._MISSING_TYPE object at 0x109634ca0>,default_factory=<dataclasses._MISSING_TYPE object at 0x109634ca0>,init=True,repr=True,hash=None,compare=True,metadata=mappingproxy({}),_field_type=_FIELD))
loaded with cloudpickle
()
restored value
()

I.e. as soon as we restore, the type of the data class is destroyed

@b78
Copy link
Author

b78 commented Jun 2, 2021

When I move the type definition into it's own file it works.

@drewm1980
Copy link

I came here as a user with a different issue, but iirc pickles were never self-describing; they always needed the definitions for non-standard-library types to deserialize correctly. So if you dump the data from a script, the pickle doesn't record a real location from whence the type came, since it was never imported, just defined in the interpreter. Then the deserializer has nothing to import to reconstruct the type.

@rsokl
Copy link

rsokl commented May 5, 2022

Duplicate of #386

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants