-
Notifications
You must be signed in to change notification settings - Fork 167
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cloudpickle breaks local dataclass #386
Comments
Hi, Thanks for the report. I see the problem. This may be worth a patch on |
The problem is that it's probably useless for CPython as if the dataclass is defined in an importable module, the above problem does not happen. I think we will have to deal with a cloudpickle fix that depends on private API an rely on tests to make sure our code tracks the internal changes of the CPython standard library. |
@pierreglaser @ogrisel did you reach a resolution on a fix? We're using cloudpickle for a new project that relies pretty heavily on dataclasses for validation and not being able to use locally defined dataclasses is causing hard constraints on the design. Any update would be much appreciated! Thank you. |
+1 on this issue. Is there any known good workaround for this? |
One simple workaround is to avoid asdict, but write an equivalent method instead, for example:
|
This is still a major issue. So I have some questions:
In my case this bug destroys the original dataclasses when a ray worker returns a dataclass result. |
I haven't tested it extensively, but in my case this workaround is working. As @jseppanen suggested I've replaced asdict with equivalent function. def as_dict(obj, *, dict_factory=dict):
if not _is_dataclass_instance(obj):
raise TypeError("asdict() should be called on dataclass instances")
return as_dict_inner(obj, dict_factory)
def as_dict_inner(obj, dict_factory=dict):
if dataclasses.is_dataclass(obj):
result = []
for f in obj.__dict__:
value = as_dict_inner(getattr(obj, f), dict_factory)
result.append((f, value))
return dict_factory(result)
elif isinstance(obj, tuple) and hasattr(obj, '_fields'):
return type(obj)(*[as_dict_inner(v, dict_factory) for v in obj])
elif isinstance(obj, (list, tuple)):
return type(obj)(as_dict_inner(v, dict_factory) for v in obj)
elif isinstance(obj, dict):
return type(obj)((as_dict_inner(k, dict_factory),
as_dict_inner(v, dict_factory))
for k, v in obj.items())
else:
return copy.deepcopy(obj) |
I am not 100% sure it's the same issue, but deserializing a dataclass right now is breaking the existing dataclass if it's defined in the same process (in the import cloudpickle
from dataclasses import dataclass
import dataclasses
@dataclass
class Test:
dim: int = 1
print(dataclasses.fields(Test))
_unused_deserialized_class = cloudpickle.loads(cloudpickle.dumps(Test))
print(dataclasses.fields(Test)) Output:
|
In the meanwhile, an alternative solution to calling |
Here is another workaround, that works exactly like def dataclass_fields(class_or_instance):
"""This function is based on dataclasses.fields(), but contains a workaround
for https://github.com/cloudpipe/cloudpickle/issues/386
"""
try:
fields = getattr(class_or_instance, dataclasses._FIELDS)
except AttributeError:
raise TypeError('must be called with a dataclass type or instance')
return tuple(f for f in fields.values() if f._field_type.name == dataclasses._FIELD.name) |
Recently discovered and posted a connected issue here. Any workarounds that does not involve changing the way the dataclass is defined (can not change source code) but can be used "after the fact"? |
Here's a solution using a subclass of import cloudpickle
import io
import dataclasses
from dataclasses import fields, dataclass, _FIELD_BASE
def _get_dataclass_field_sentinel(name):
"""Return a sentinel object for a dataclass field."""
return getattr(dataclasses, name)
class PatchedCloudPickler(cloudpickle.CloudPickler):
def reducer_override(self, obj):
"""Custom reducer for MyClass."""
if isinstance(obj, _FIELD_BASE):
return _get_dataclass_field_sentinel, (obj.name,)
return super().reducer_override(obj)
def dumps(value, protocol=None):
with io.BytesIO() as file:
PatchedCloudPickler(file, protocol).dump(value)
return file.getvalue()
@dataclass
class InClass:
a: int
b: int
OutClass = cloudpickle.loads(dumps(InClass))
assert fields(OutClass) If you need to you can monkey-patch cloudpickle.fast_cloudpickle.CloudPickler = PatchedCloudPickler If the |
I posted a potential fix here: #513 Would be great to get some feedback on it. |
Consider the following test, the last assertEqual fails. This fails because the test in
dataclasses.fields
off._field_type is _FIELD
fails. See https://github.com/python/cpython/blob/3.7/Lib/dataclasses.py#L1028This is because
f._field_type
points to a different object thandataclasses._FIELD
.The text was updated successfully, but these errors were encountered: