-
Notifications
You must be signed in to change notification settings - Fork 7k
[Serve] Keep replica context object serializable #58510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Conversation
Signed-off-by: abrar <abrar@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request makes the ReplicaContext object serializable by correctly handling the non-serializable _handle_registration_callback during pickling. The implementation uses __getstate__ and __setstate__ which is a standard and effective approach. The addition of a new test file with comprehensive unit tests for serialization is excellent and covers various scenarios, ensuring the change is robust. The code is well-written and the changes are solid. I have one minor suggestion to enhance test completeness.
| assert deserialized.replica_id == ctx.replica_id | ||
| assert deserialized.rank == ctx.rank | ||
| assert deserialized.world_size == ctx.world_size |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For better test completeness and consistency with test_pickle_without_callback, it would be good to also assert that servable_object and _deployment_config are preserved after deserialization. While servable_object is None in this test, explicitly checking it and _deployment_config makes the test more robust against future regressions.
| assert deserialized.replica_id == ctx.replica_id | |
| assert deserialized.rank == ctx.rank | |
| assert deserialized.world_size == ctx.world_size | |
| assert deserialized.replica_id == ctx.replica_id | |
| assert deserialized.servable_object == ctx.servable_object | |
| assert deserialized._deployment_config == ctx._deployment_config | |
| assert deserialized.rank == ctx.rank | |
| assert deserialized.world_size == ctx.world_size |
eicherseiji
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🙏
nrghosh
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This makes sense in theory, but from what I see __getstate__ isn't being respected during Ray's serialization process
testing this PR with same test (from doc test failures where we discovered this) - python doc/source/llm/doc_code/serve/multi_gpu/dp_basic_example.py
steps
- check out PR
- symlink (setup-dev) serve
- run doc test
results in same failures:
!!! FAIL serialization: no default __reduce__ due to non-trivial __cinit__
(ServeController pid=158116) Serializing '_handle_registration_callback' <function ReplicaBase._set_internal_replica_context.<locals>.register_handle_callback at 0x7588682f2840>...
(ServeController pid=158116) !!! FAIL serialization: no default __reduce__ due to non-trivial __cinit__
(ServeController pid=158116) Detected 1 nonlocal variables. Checking serializability...
(ServeController pid=158116) Serializing 'self' <ray.serve._private.replica.Replica object at 0x7589b5d83550>...
(ServeController pid=158116) !!! FAIL serialization: no default __reduce__ due to non-trivial __cinit__
(ServeController pid=158116) Serializing '_abc_impl' <_abc._abc_data object at 0x7589b5d4b340>...
(ServeController pid=158116) !!! FAIL serialization: cannot pickle '_abc._abc_data' object
(ServeController pid=158116) Serializing '_annotated' ReplicaContext...
(ServeController pid=158116) ================================================================================
|
This pull request has been automatically marked as stale because it has not had You can always ask for help on our discussion forum or Ray's public slack channel. If you'd like to keep this open, just leave any comment, and the stale label will be removed. |
follow up from #58504