Skip to content

Conversation

@iamjustinhsu
Copy link
Contributor

@iamjustinhsu iamjustinhsu commented Aug 11, 2025

Screen.Recording.2025-08-11.at.4.22.17.PM.mov

Why are these changes needed?

  • ray data dashboard is ugly
  • grouping them into pending inputs, inputs, outputs, overview, pending outputs, scheduling loop, resource usage/budget, and iteration
  • also changed the metric from external outqueue of op1 to external inqueue of op2 (so that i can combine the internal + external inqueues easily)

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
    • I've added any new APIs to the API Reference. For example, if I added a
      method in Tune, I've added it in doc/source/tune/api/ under the
      corresponding .rst file.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request significantly improves the structure and maintainability of the Ray Data dashboard panels by refactoring them from a flat list into logically grouped, collapsible rows. The introduction of a PanelIdGenerator is a great addition to prevent ID collisions automatically. The changes also include renaming metrics like num_output_queue_* to num_external_inqueue_* for better clarity, which is reflected across the codebase. Overall, these are excellent changes. I have one minor suggestion to correct a metric description that was likely overlooked during the refactoring.

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu marked this pull request as ready for review August 11, 2025 23:21
@iamjustinhsu iamjustinhsu requested a review from a team as a code owner August 11, 2025 23:21
@iamjustinhsu iamjustinhsu changed the title Jhsu/data dashboard grouping [data] data metrics grouping Aug 11, 2025
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
@iamjustinhsu iamjustinhsu requested a review from a team as a code owner August 12, 2025 17:27
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Comment on lines -327 to -334
num_output_queue_blocks: int = metric_field(
num_external_inqueue_blocks: int = metric_field(
default=0,
description="Number of blocks in the output queue",
description="Number of blocks in the external inqueue",
metrics_group=MetricsGroup.OUTPUTS,
)
num_output_queue_bytes: int = metric_field(
num_external_inqueue_bytes: int = metric_field(
default=0,
description="Byte size of blocks in the output queue",
Copy link
Contributor Author

@iamjustinhsu iamjustinhsu Aug 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

i changed these to be external INPUT queue, over external OUTPUT queue because it was easier to aggregate the sum of internal input queue + external queue per operator. Otherwise, I'm not sure how to sum the total queued inputs.

Copy link
Contributor

@omatthew98 omatthew98 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lgtm from the data side, we should check with the observability team to make sure they aren't using any of the metrics we are changing the names of though.

metrics_group=MetricsGroup.OUTPUTS,
)
num_output_queue_blocks: int = metric_field(
num_external_inqueue_blocks: int = metric_field(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@coqian FYI, if we rename some of these will it be a problem? Is the data dashboard using some of these metrics?

# Overview Row
Row(
title="Overview",
id=99,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do the row ids collide with panel ids? Could we say start this at 0/1? Don't think it really matters either way.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ya that must be unique

@ray-gardener ray-gardener bot added data Ray Data-related issues observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling labels Aug 15, 2025
@iamjustinhsu iamjustinhsu added the go add ONLY when ready to merge, run all tests label Aug 15, 2025
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Copy link
Contributor

@can-anyscale can-anyscale left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

data team can decide

@bveeramani bveeramani merged commit 2647c9b into ray-project:master Aug 19, 2025
5 checks passed
@iamjustinhsu iamjustinhsu deleted the jhsu/data-dashboard-grouping branch August 19, 2025 19:06
dioptre pushed a commit to sourcetable/ray that referenced this pull request Aug 20, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3

## Why are these changes needed?
- ray data dashboard is ugly
- grouping them into pending inputs, inputs, outputs, overview, pending
outputs, scheduling loop, resource usage/budget, and iteration
- also changed the metric from external outqueue of op1 to external
inqueue of op2 (so that i can combine the internal + external inqueues
easily)
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: Andrew Grosser <dioptre@gmail.com>
edoakes pushed a commit that referenced this pull request Sep 2, 2025
Since we introduced panel groups to Default
(#55620) & Data
(#55495) dashboards, applications
consuming Grafana dashboards can comfortably embed the full dashboard on
any UI now (and the other dashboards are pretty usable even without
them).

Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default
Dashboard to indicate to consumers that we support full Grafana
dashboard embedding from now on.

---------

Signed-off-by: anmol <anmol@anyscale.com>
Co-authored-by: anmol <anmol@anyscale.com>
sampan-s-nayak pushed a commit to sampan-s-nayak/ray that referenced this pull request Sep 8, 2025
…roject#56077)

Since we introduced panel groups to Default
(ray-project#55620) & Data
(ray-project#55495) dashboards, applications
consuming Grafana dashboards can comfortably embed the full dashboard on
any UI now (and the other dashboards are pretty usable even without
them).

Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default
Dashboard to indicate to consumers that we support full Grafana
dashboard embedding from now on.

---------

Signed-off-by: anmol <anmol@anyscale.com>
Co-authored-by: anmol <anmol@anyscale.com>
Signed-off-by: sampan <sampan@anyscale.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3

## Why are these changes needed?
- ray data dashboard is ugly
- grouping them into pending inputs, inputs, outputs, overview, pending
outputs, scheduling loop, resource usage/budget, and iteration
- also changed the metric from external outqueue of op1 to external
inqueue of op2 (so that i can combine the internal + external inqueues
easily)
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
jugalshah291 pushed a commit to jugalshah291/ray_fork that referenced this pull request Sep 11, 2025
…roject#56077)

Since we introduced panel groups to Default
(ray-project#55620) & Data
(ray-project#55495) dashboards, applications
consuming Grafana dashboards can comfortably embed the full dashboard on
any UI now (and the other dashboards are pretty usable even without
them).

Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default
Dashboard to indicate to consumers that we support full Grafana
dashboard embedding from now on.

---------

Signed-off-by: anmol <anmol@anyscale.com>
Co-authored-by: anmol <anmol@anyscale.com>
Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
wyhong3103 pushed a commit to wyhong3103/ray that referenced this pull request Sep 12, 2025
…roject#56077)

Since we introduced panel groups to Default
(ray-project#55620) & Data
(ray-project#55495) dashboards, applications
consuming Grafana dashboards can comfortably embed the full dashboard on
any UI now (and the other dashboards are pretty usable even without
them).

Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default
Dashboard to indicate to consumers that we support full Grafana
dashboard embedding from now on.

---------

Signed-off-by: anmol <anmol@anyscale.com>
Co-authored-by: anmol <anmol@anyscale.com>
Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
dstrodtman pushed a commit that referenced this pull request Oct 6, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->

https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3

## Why are these changes needed?
- ray data dashboard is ugly
- grouping them into pending inputs, inputs, outputs, overview, pending
outputs, scheduling loop, resource usage/budget, and iteration
- also changed the metric from external outqueue of op1 to external
inqueue of op2 (so that i can combine the internal + external inqueues
easily)
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
dstrodtman pushed a commit to dstrodtman/ray that referenced this pull request Oct 6, 2025
…roject#56077)

Since we introduced panel groups to Default
(ray-project#55620) & Data
(ray-project#55495) dashboards, applications
consuming Grafana dashboards can comfortably embed the full dashboard on
any UI now (and the other dashboards are pretty usable even without
them).

Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default
Dashboard to indicate to consumers that we support full Grafana
dashboard embedding from now on.

---------

Signed-off-by: anmol <anmol@anyscale.com>
Co-authored-by: anmol <anmol@anyscale.com>
Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
bveeramani pushed a commit that referenced this pull request Oct 27, 2025
…pported operator filter (#57970)

## Description
The per node metrics at OSS Ray Data dashboard are not displayed as
expected.
Because of this code change #55495, the following three metrics were
added a filter for `operator`, which is [not
supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448)
by per node metrics, and causes empty result.
ray_data_num_tasks_finished_per_node
ray_data_bytes_outputs_of_finished_tasks_per_node
ray_data_blocks_outputs_of_finished_tasks_per_node

Signed-off-by: cong.qian <cong.qian@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
<!-- Thank you for your contribution! Please review
https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before
opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR.
If you don't have the access to it, we will shortly find a reviewer and
assign them to your PR. -->


https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3


## Why are these changes needed?
- ray data dashboard is ugly
- grouping them into pending inputs, inputs, outputs, overview, pending
outputs, scheduling loop, resource usage/budget, and iteration
- also changed the metric from external outqueue of op1 to external
inqueue of op2 (so that i can combine the internal + external inqueues
easily)
<!-- Please give a short summary of the change and the problem this
solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git
commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for
https://docs.ray.io/en/master/.
- [ ] I've added any new APIs to the API Reference. For example, if I
added a
method in Tune, I've added it in `doc/source/tune/api/` under the
           corresponding `.rst` file.
- [ ] I've made sure the tests are passing. Note that there might be a
few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

---------

Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…roject#56077)

Since we introduced panel groups to Default
(ray-project#55620) & Data
(ray-project#55495) dashboards, applications
consuming Grafana dashboards can comfortably embed the full dashboard on
any UI now (and the other dashboards are pretty usable even without
them).

Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default
Dashboard to indicate to consumers that we support full Grafana
dashboard embedding from now on.

---------

Signed-off-by: anmol <anmol@anyscale.com>
Co-authored-by: anmol <anmol@anyscale.com>
landscapepainter pushed a commit to landscapepainter/ray that referenced this pull request Nov 17, 2025
…pported operator filter (ray-project#57970)

## Description
The per node metrics at OSS Ray Data dashboard are not displayed as
expected.
Because of this code change ray-project#55495, the following three metrics were
added a filter for `operator`, which is [not
supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448)
by per node metrics, and causes empty result.
ray_data_num_tasks_finished_per_node
ray_data_bytes_outputs_of_finished_tasks_per_node
ray_data_blocks_outputs_of_finished_tasks_per_node

Signed-off-by: cong.qian <cong.qian@anyscale.com>
Aydin-ab pushed a commit to Aydin-ab/ray-aydin that referenced this pull request Nov 19, 2025
…pported operator filter (ray-project#57970)

## Description
The per node metrics at OSS Ray Data dashboard are not displayed as
expected.
Because of this code change ray-project#55495, the following three metrics were
added a filter for `operator`, which is [not
supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448)
by per node metrics, and causes empty result.
ray_data_num_tasks_finished_per_node
ray_data_bytes_outputs_of_finished_tasks_per_node
ray_data_blocks_outputs_of_finished_tasks_per_node

Signed-off-by: cong.qian <cong.qian@anyscale.com>
Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Future-Outlier pushed a commit to Future-Outlier/ray that referenced this pull request Dec 7, 2025
…pported operator filter (ray-project#57970)

## Description
The per node metrics at OSS Ray Data dashboard are not displayed as
expected.
Because of this code change ray-project#55495, the following three metrics were
added a filter for `operator`, which is [not
supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448)
by per node metrics, and causes empty result.
ray_data_num_tasks_finished_per_node
ray_data_bytes_outputs_of_finished_tasks_per_node
ray_data_blocks_outputs_of_finished_tasks_per_node

Signed-off-by: cong.qian <cong.qian@anyscale.com>
Signed-off-by: Future-Outlier <eric901201@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

data Ray Data-related issues go add ONLY when ready to merge, run all tests observability Issues related to the Ray Dashboard, Logging, Metrics, Tracing, and/or Profiling

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants