-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Serve] Make Ray Serve compatible with Pydantic 1.10.x #39864
Comments
Here's a debugging script that I'm using to experiment with the failing test: # import pickle
import ray.cloudpickle as pickle
from copy import deepcopy
from pydantic import BaseModel
from pydantic.dataclasses import is_builtin_dataclass
from dataclasses import dataclass
from typing import Any, Dict, Optional
@dataclass
class BackendMetadata:
is_blocking: bool = True
autoscaling_config: Optional[Dict[str, Any]] = None
bm = deepcopy(BackendMetadata)
# breakpoint()
class BackendConfig(BaseModel):
internal_metadata: BackendMetadata = BackendMetadata()
my = BackendConfig()
def ser():
print(pickle.dumps(BackendMetadata))
def cmp():
print(bm is BackendMetadata)
def pd_dc():
print(is_builtin_dataclass(BackendMetadata))
breakpoint()
culprit = type(my.__dict__["internal_metadata"]).__dict__["__init__"]
print("Finished")
# Hypothesis: adding BackendMetadata to the BaseModel makes pydantic switch
# the dataclass's __init__ method with a cyfunction __init__ method that
# can't be serialized by cloudpickle (but can be serialized with pickle?). |
And here's a repro script: # File name: repro.py
import ray
# import pickle
import ray.cloudpickle as pickle
from pydantic import BaseModel
from dataclasses import dataclass
from typing import Any, Dict, List, Optional
# START_RAY = True
START_RAY = False
if START_RAY:
ray.init(ignore_reinit_error=True)
@dataclass
class BackendMetadata:
is_blocking: bool = True
autoscaling_config: Optional[Dict[str, Any]] = None
class BackendConfig(BaseModel):
internal_metadata: BackendMetadata = BackendMetadata()
if START_RAY:
ray.get(ray.put(BackendConfig()))
@ray.remote
def consume(f):
pass
ray.get(consume.remote(BackendConfig()))
else:
serialized_object = pickle.dumps(BackendConfig())
deserialized_object = pickle.loads(serialized_object) |
This turns out to be harder than expected. The issue is that Pydantic 1.10.x converts vanilla dataclasses into Pydantic dataclasses when they're used as type hints in a Pydantic model. During this process, Pydantic also replaces plain Python methods with cythonized methods, which can't be serialized with % python repro.py
Traceback (most recent call last):
File "repro.py", line 32, in <module>
serialized_object = pickle.dumps(BackendConfig())
File "/Users/shrekris/Desktop/ray/python/ray/cloudpickle/cloudpickle_fast.py", line 88, in dumps
cp.dump(obj)
File "/Users/shrekris/Desktop/ray/python/ray/cloudpickle/cloudpickle_fast.py", line 733, in dump
return Pickler.dump(self, obj)
AttributeError: Can't pickle local object '__create_fn__.<locals>.__init__' The |
Ideally, Pydantic either (1) wouldn't mutate the original dataclass or (2) would keep a copy of the original dataclass that we could use to create a custom serializer. Unfortunately, it looks like Pydantic (1) mutates the original dataclass and (2) doesn't keep a copy of the original dataclass anywhere. One possible, hacky workaround: the Pydantic dataclass still has the same fields as the original dataclass. Serve could reconstruct a copy of the original dataclass by creating a new dataclass with the same fields upon deserialization. However, if the user defines a custom constructor, that would be lost. |
ref #40451 |
We've merged the changes to make Ray compatible with Pydantic 2.5+. You can start using Pydantic 2.5+ with Ray 2.9, which should be out at the end of December. These changes should also be in the Ray nightly, so feel free to try them out! |
What happened + What you expected to happen
Currently, Ray's cloudpickle cannot serialize Pydantic dataclasses in Pydantic 1.10.x. However, it can serialize them in Pydantic 1.9.2.
Ray's cloudpickle logic should be updated to successfully serialize Pydantic dataclasses. This test should succeed.
Versions / Dependencies
Ray on the latest
master
. Pydantic 1.10.x.Reproduction script
See test_serialize_serve_dataclass.
Issue Severity
High: It blocks me from completing my task.
The text was updated successfully, but these errors were encountered: