Skip to content

Conversation

@alanwguo
Copy link
Contributor

@alanwguo alanwguo commented Aug 27, 2025

cherrypick #56009

Why are these changes needed?

Bugs introduced in #52102

Two bugs:

  • proc is a TypedDict so it needs to be fetched via proc["pid"] instead of proc.pid.
  • Changing processes_pid is backwards-incompatible change that ends up changing the dashboard APIs that power the ray dashboard. Maintain backwards-compatibility

Verified fix:
Metrics work again:
Screenshot 2025-08-27 at 12 22 40 PM

Ray Dashboard works again:
Screenshot 2025-08-27 at 12 21 51 PM

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: Alan Guo <aguo@anyscale.com>
@alanwguo alanwguo changed the base branch from master to releases/2.49.0 August 27, 2025 18:32
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request primarily addresses a bug in GPU metrics reporting by correcting how process IDs are accessed. It also includes a significant refactoring of ray.train.v2 to simplify dataset handling by removing the DatasetManager actor, which improves maintainability. Additionally, it updates the project version to 2.49.0 across various files, including Python, Java, and C++ components, and adds protobuf compatibility helpers for MessageToJson. The tests have been updated to reflect these changes, including a refactoring of aggregator agent tests for better structure. My feedback is minor, pointing out a small typo in a comment.

@ray-gardener ray-gardener bot added core Issues that should be addressed in Ray Core observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Aug 27, 2025
Signed-off-by: Alan Guo <aguo@anyscale.com>
Signed-off-by: Alan Guo <aguo@anyscale.com>
if processes:
for proc in processes.values():
gpu_pid_mapping[proc.pid].append(proc)
gpu_pid_mapping[proc["pid"]].append(proc)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dang this type of bug should have been caught with a type checker (if we can typify this and other classes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think TypedDict actually breaks the typechecker.

Autocomplete suggests you can use .field but in reality you cannot.

@aslonnie aslonnie changed the base branch from releases/2.49.0 to releases/2.49.1 August 27, 2025 20:00
Signed-off-by: Alan Guo <aguo@anyscale.com>
@alanwguo alanwguo added the go add ONLY when ready to merge, run all tests label Aug 27, 2025
Signed-off-by: Alan Guo <aguo@anyscale.com>
Signed-off-by: Alan Guo <aguo@anyscale.com>
@aslonnie aslonnie merged commit 3fe06a9 into ray-project:releases/2.49.1 Aug 28, 2025
5 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

core Issues that should be addressed in Ray Core go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants