Skip to content

Conversation

@abrarsheikh
Copy link
Contributor

@abrarsheikh abrarsheikh commented Oct 31, 2025

Summary

Adds a new method to expose all downstream deployments that a replica calls into, enabling dependency graph construction.

Motivation

Deployments call downstream deployments via handles in two ways:

  1. Stored handles: Passed to __init__() and stored as attributes → self.model.func.remote()
  2. Dynamic handles: Obtained at runtime via serve.get_deployment_handle()model.func.remote()

Previously, there was no way to programmatically discover these dependencies from a running replica.

Implementation

Core Changes

  • ReplicaActor.list_outbound_deployments(): Returns List[DeploymentID] of all downstream deployments

    • Recursively inspects user callable attributes to find stored handles (including nested in dicts/lists)
    • Tracks dynamic handles created via get_deployment_handle() at runtime using a callback mechanism
  • Runtime tracking: Modified get_deployment_handle() to register handles when called from within a replica via ReplicaContext._handle_registration_callback

Next PR: #58350

Signed-off-by: abrar <abrar@anyscale.com>
@abrarsheikh abrarsheikh added the go add ONLY when ready to merge, run all tests label Oct 31, 2025
@abrarsheikh abrarsheikh changed the title expose outbound deployment ids from replica actor [1/n] expose outbound deployment ids from replica actor Oct 31, 2025
@abrarsheikh abrarsheikh marked this pull request as ready for review October 31, 2025 22:11
@abrarsheikh abrarsheikh requested a review from a team as a code owner October 31, 2025 22:11
_deployment_config: DeploymentConfig,
rank: int,
world_size: int,
handle_registration_callback: Optional[Callable[[str, str], None]] = None,
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bug: Type mismatch in callback for replica context

The handle_registration_callback parameter in _set_internal_replica_context has a type annotation mismatch. It's currently Callable[[str, str], None], but the ReplicaContext field and its actual invocation expect Callable[[DeploymentID], None]. This difference could lead to a runtime type error.

Fix in Cursor Fix in Web

@ray-gardener ray-gardener bot added serve Ray Serve Related Issue observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Nov 1, 2025
@akyang-anyscale
Copy link
Contributor

in theory, could the recorded dynamic handles be different per replica process based on business logic? how would we compile deployment dag based on this?

@abrarsheikh
Copy link
Contributor Author

in theory, could the recorded dynamic handles be different per replica process based on business logic? how would we compile deployment dag based on this?

This is possible, but I would assume it's rare.

Since we enrich the DAG over a period of time, I expect the DAG to be a good representation of reality. But ultimately, this is the best effort

@abrarsheikh
Copy link
Contributor Author

Generally speaking, you are right, if there are two replicas for a deployment and if we keep switching between them, then the DAG will change over time. Not the best experience, but probably okay?

Another design choice we can make is, ensure that DAG is the same across all replicas; if they are different, then don't construct the DAG. Only downside to this is seeking information from all replicas can be expensive.

@akyang-anyscale
Copy link
Contributor

could it be unified at some higher level (like deployment level instead of replicas)?

@abrarsheikh
Copy link
Contributor Author

Yeah, we can do that here , instead of poking one replica, we can query all and union them.

Signed-off-by: abrar <abrar@anyscale.com>
Comment on lines +1249 to +1257
scanner = _PyObjScanner(source_type=DeploymentHandle)
try:
handles = scanner.find_nodes((init_args, init_kwargs))

for handle in handles:
deployment_id = handle.deployment_id
seen_deployment_ids.add(deployment_id)
finally:
scanner.clear()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this can be cached, but not super important because list_outbound_deployments will be called infrequently.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the frequency?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exponential backoff starting from 1s then capped at 10mins

Comment on lines +1249 to +1257
scanner = _PyObjScanner(source_type=DeploymentHandle)
try:
handles = scanner.find_nodes((init_args, init_kwargs))

for handle in handles:
deployment_id = handle.deployment_id
seen_deployment_ids.add(deployment_id)
finally:
scanner.clear()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what's the frequency?

@abrarsheikh abrarsheikh merged commit b9ee3fe into master Nov 6, 2025
6 checks passed
@abrarsheikh abrarsheikh deleted the dag-of-deployments branch November 6, 2025 21:33
YoussefEssDS pushed a commit to YoussefEssDS/ray that referenced this pull request Nov 8, 2025
…58345)

## Summary
Adds a new method to expose all downstream deployments that a replica
calls into, enabling dependency graph construction.

## Motivation
Deployments call downstream deployments via handles in two ways:
1. **Stored handles**: Passed to `__init__()` and stored as attributes →
`self.model.func.remote()`
2. **Dynamic handles**: Obtained at runtime via
`serve.get_deployment_handle()` → `model.func.remote()`

Previously, there was no way to programmatically discover these
dependencies from a running replica.

## Implementation

### Core Changes
- **`ReplicaActor.list_outbound_deployments()`**: Returns
`List[DeploymentID]` of all downstream deployments
- Recursively inspects user callable attributes to find stored handles
(including nested in dicts/lists)
- Tracks dynamic handles created via `get_deployment_handle()` at
runtime using a callback mechanism

- **Runtime tracking**: Modified `get_deployment_handle()` to register
handles when called from within a replica via
`ReplicaContext._handle_registration_callback`


Next PR: ray-project#58350

---------

Signed-off-by: abrar <abrar@anyscale.com>
nrghosh added a commit to nrghosh/ray that referenced this pull request Nov 10, 2025
Fixes ray-project#58475 and ray-project#58474

Root cause - addition of `_handle_registration_callback` in ray-project#58345

Fix: Remove unused replica_ctx parameter from DPRankAssigner.register() to
fix serialization error when passing ReplicaContext across actor
boundaries.

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
nrghosh added a commit to nrghosh/ray that referenced this pull request Nov 10, 2025
Remove unused replica_ctx parameter from DPRankAssigner.register() to
fix serialization error when passing ReplicaContext across actor
boundaries. The ReplicaContext contains non-serializable callback
closures that cause Ray serialization to fail.

Root cause: The addition of _handle_registration_callback field to
ReplicaContext in PR ray-project#58345 introduced non-serializable callback
closures that cannot be passed across actor boundaries.

Fixes ray-project#58474
Fixes ray-project#58475

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
nrghosh added a commit to nrghosh/ray that referenced this pull request Nov 10, 2025
Remove unused replica_ctx parameter from DPRankAssigner.register() to
fix serialization error when passing ReplicaContext across actor
boundaries.

Root cause: The addition of _handle_registration_callback field to
ReplicaContext in PR ray-project#58345

Fixes ray-project#58474 and ray-project#58475

Signed-off-by: Nikhil Ghosh <nikhil@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…58345)

## Summary
Adds a new method to expose all downstream deployments that a replica
calls into, enabling dependency graph construction.

## Motivation
Deployments call downstream deployments via handles in two ways:
1. **Stored handles**: Passed to `__init__()` and stored as attributes →
`self.model.func.remote()`
2. **Dynamic handles**: Obtained at runtime via
`serve.get_deployment_handle()` → `model.func.remote()`

Previously, there was no way to programmatically discover these
dependencies from a running replica.

## Implementation

### Core Changes
- **`ReplicaActor.list_outbound_deployments()`**: Returns
`List[DeploymentID]` of all downstream deployments
- Recursively inspects user callable attributes to find stored handles
(including nested in dicts/lists)
- Tracks dynamic handles created via `get_deployment_handle()` at
runtime using a callback mechanism

- **Runtime tracking**: Modified `get_deployment_handle()` to register
handles when called from within a replica via
`ReplicaContext._handle_registration_callback`


Next PR: ray-project#58350

---------

Signed-off-by: abrar <abrar@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…58345)

## Summary
Adds a new method to expose all downstream deployments that a replica
calls into, enabling dependency graph construction.

## Motivation
Deployments call downstream deployments via handles in two ways:
1. **Stored handles**: Passed to `__init__()` and stored as attributes →
`self.model.func.remote()`
2. **Dynamic handles**: Obtained at runtime via
`serve.get_deployment_handle()` → `model.func.remote()`

Previously, there was no way to programmatically discover these
dependencies from a running replica.

## Implementation

### Core Changes
- **`ReplicaActor.list_outbound_deployments()`**: Returns
`List[DeploymentID]` of all downstream deployments
- Recursively inspects user callable attributes to find stored handles
(including nested in dicts/lists)
- Tracks dynamic handles created via `get_deployment_handle()` at
runtime using a callback mechanism

- **Runtime tracking**: Modified `get_deployment_handle()` to register
handles when called from within a replica via
`ReplicaContext._handle_registration_callback`

Next PR: ray-project#58350

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
ykdojo pushed a commit to ykdojo/ray that referenced this pull request Nov 27, 2025
…58345)

## Summary
Adds a new method to expose all downstream deployments that a replica
calls into, enabling dependency graph construction.

## Motivation
Deployments call downstream deployments via handles in two ways:
1. **Stored handles**: Passed to `__init__()` and stored as attributes →
`self.model.func.remote()`
2. **Dynamic handles**: Obtained at runtime via
`serve.get_deployment_handle()` → `model.func.remote()`

Previously, there was no way to programmatically discover these
dependencies from a running replica.

## Implementation

### Core Changes
- **`ReplicaActor.list_outbound_deployments()`**: Returns
`List[DeploymentID]` of all downstream deployments
- Recursively inspects user callable attributes to find stored handles
(including nested in dicts/lists)
- Tracks dynamic handles created via `get_deployment_handle()` at
runtime using a callback mechanism

- **Runtime tracking**: Modified `get_deployment_handle()` to register
handles when called from within a replica via
`ReplicaContext._handle_registration_callback`

Next PR: ray-project#58350

---------

Signed-off-by: abrar <abrar@anyscale.com>
Signed-off-by: YK <1811651+ykdojo@users.noreply.github.com>
SheldonTsen pushed a commit to SheldonTsen/ray that referenced this pull request Dec 1, 2025
…58345)

## Summary
Adds a new method to expose all downstream deployments that a replica
calls into, enabling dependency graph construction.

## Motivation
Deployments call downstream deployments via handles in two ways:
1. **Stored handles**: Passed to `__init__()` and stored as attributes →
`self.model.func.remote()`
2. **Dynamic handles**: Obtained at runtime via
`serve.get_deployment_handle()` → `model.func.remote()`

Previously, there was no way to programmatically discover these
dependencies from a running replica.

## Implementation

### Core Changes
- **`ReplicaActor.list_outbound_deployments()`**: Returns
`List[DeploymentID]` of all downstream deployments
- Recursively inspects user callable attributes to find stored handles
(including nested in dicts/lists)
- Tracks dynamic handles created via `get_deployment_handle()` at
runtime using a callback mechanism

- **Runtime tracking**: Modified `get_deployment_handle()` to register
handles when called from within a replica via
`ReplicaContext._handle_registration_callback`


Next PR: ray-project#58350

---------

Signed-off-by: abrar <abrar@anyscale.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling serve Ray Serve Related Issue

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants