-
Notifications
You must be signed in to change notification settings - Fork 7k
[data] data metrics grouping #55495
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[data] data metrics grouping #55495
Conversation
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request significantly improves the structure and maintainability of the Ray Data dashboard panels by refactoring them from a flat list into logically grouped, collapsible rows. The introduction of a PanelIdGenerator is a great addition to prevent ID collisions automatically. The changes also include renaming metrics like num_output_queue_* to num_external_inqueue_* for better clarity, which is reflected across the codebase. Overall, these are excellent changes. I have one minor suggestion to correct a metric description that was likely overlooked during the refactoring.
python/ray/data/_internal/execution/interfaces/op_runtime_metrics.py
Outdated
Show resolved
Hide resolved
Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
| num_output_queue_blocks: int = metric_field( | ||
| num_external_inqueue_blocks: int = metric_field( | ||
| default=0, | ||
| description="Number of blocks in the output queue", | ||
| description="Number of blocks in the external inqueue", | ||
| metrics_group=MetricsGroup.OUTPUTS, | ||
| ) | ||
| num_output_queue_bytes: int = metric_field( | ||
| num_external_inqueue_bytes: int = metric_field( | ||
| default=0, | ||
| description="Byte size of blocks in the output queue", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i changed these to be external INPUT queue, over external OUTPUT queue because it was easier to aggregate the sum of internal input queue + external queue per operator. Otherwise, I'm not sure how to sum the total queued inputs.
omatthew98
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Lgtm from the data side, we should check with the observability team to make sure they aren't using any of the metrics we are changing the names of though.
| metrics_group=MetricsGroup.OUTPUTS, | ||
| ) | ||
| num_output_queue_blocks: int = metric_field( | ||
| num_external_inqueue_blocks: int = metric_field( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@coqian FYI, if we rename some of these will it be a problem? Is the data dashboard using some of these metrics?
| # Overview Row | ||
| Row( | ||
| title="Overview", | ||
| id=99, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do the row ids collide with panel ids? Could we say start this at 0/1? Don't think it really matters either way.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ya that must be unique
…/data-dashboard-grouping
can-anyscale
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
data team can decide
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3 ## Why are these changes needed? - ray data dashboard is ugly - grouping them into pending inputs, inputs, outputs, overview, pending outputs, scheduling loop, resource usage/budget, and iteration - also changed the metric from external outqueue of op1 to external inqueue of op2 (so that i can combine the internal + external inqueues easily) <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Andrew Grosser <dioptre@gmail.com>
Since we introduced panel groups to Default (#55620) & Data (#55495) dashboards, applications consuming Grafana dashboards can comfortably embed the full dashboard on any UI now (and the other dashboards are pretty usable even without them). Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default Dashboard to indicate to consumers that we support full Grafana dashboard embedding from now on. --------- Signed-off-by: anmol <anmol@anyscale.com> Co-authored-by: anmol <anmol@anyscale.com>
…roject#56077) Since we introduced panel groups to Default (ray-project#55620) & Data (ray-project#55495) dashboards, applications consuming Grafana dashboards can comfortably embed the full dashboard on any UI now (and the other dashboards are pretty usable even without them). Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default Dashboard to indicate to consumers that we support full Grafana dashboard embedding from now on. --------- Signed-off-by: anmol <anmol@anyscale.com> Co-authored-by: anmol <anmol@anyscale.com> Signed-off-by: sampan <sampan@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3 ## Why are these changes needed? - ray data dashboard is ugly - grouping them into pending inputs, inputs, outputs, overview, pending outputs, scheduling loop, resource usage/budget, and iteration - also changed the metric from external outqueue of op1 to external inqueue of op2 (so that i can combine the internal + external inqueues easily) <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
…roject#56077) Since we introduced panel groups to Default (ray-project#55620) & Data (ray-project#55495) dashboards, applications consuming Grafana dashboards can comfortably embed the full dashboard on any UI now (and the other dashboards are pretty usable even without them). Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default Dashboard to indicate to consumers that we support full Grafana dashboard embedding from now on. --------- Signed-off-by: anmol <anmol@anyscale.com> Co-authored-by: anmol <anmol@anyscale.com> Signed-off-by: jugalshah291 <shah.jugal291@gmail.com>
…roject#56077) Since we introduced panel groups to Default (ray-project#55620) & Data (ray-project#55495) dashboards, applications consuming Grafana dashboards can comfortably embed the full dashboard on any UI now (and the other dashboards are pretty usable even without them). Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default Dashboard to indicate to consumers that we support full Grafana dashboard embedding from now on. --------- Signed-off-by: anmol <anmol@anyscale.com> Co-authored-by: anmol <anmol@anyscale.com> Signed-off-by: yenhong.wong <yenhong.wong@grabtaxi.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3 ## Why are these changes needed? - ray data dashboard is ugly - grouping them into pending inputs, inputs, outputs, overview, pending outputs, scheduling loop, resource usage/budget, and iteration - also changed the metric from external outqueue of op1 to external inqueue of op2 (so that i can combine the internal + external inqueues easily) <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…roject#56077) Since we introduced panel groups to Default (ray-project#55620) & Data (ray-project#55495) dashboards, applications consuming Grafana dashboards can comfortably embed the full dashboard on any UI now (and the other dashboards are pretty usable even without them). Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default Dashboard to indicate to consumers that we support full Grafana dashboard embedding from now on. --------- Signed-off-by: anmol <anmol@anyscale.com> Co-authored-by: anmol <anmol@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>
…pported operator filter (#57970) ## Description The per node metrics at OSS Ray Data dashboard are not displayed as expected. Because of this code change #55495, the following three metrics were added a filter for `operator`, which is [not supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448) by per node metrics, and causes empty result. ray_data_num_tasks_finished_per_node ray_data_bytes_outputs_of_finished_tasks_per_node ray_data_blocks_outputs_of_finished_tasks_per_node Signed-off-by: cong.qian <cong.qian@anyscale.com>
<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> https://github.com/user-attachments/assets/8caa7448-35b1-4945-9e41-82fd9efca4f3 ## Why are these changes needed? - ray data dashboard is ugly - grouping them into pending inputs, inputs, outputs, overview, pending outputs, scheduling loop, resource usage/budget, and iteration - also changed the metric from external outqueue of op1 to external inqueue of op2 (so that i can combine the internal + external inqueues easily) <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: iamjustinhsu <jhsu@anyscale.com>
…roject#56077) Since we introduced panel groups to Default (ray-project#55620) & Data (ray-project#55495) dashboards, applications consuming Grafana dashboards can comfortably embed the full dashboard on any UI now (and the other dashboards are pretty usable even without them). Added a `"supportsFullGrafanaView"` tag to the `rayMeta` list in Default Dashboard to indicate to consumers that we support full Grafana dashboard embedding from now on. --------- Signed-off-by: anmol <anmol@anyscale.com> Co-authored-by: anmol <anmol@anyscale.com>
…pported operator filter (ray-project#57970) ## Description The per node metrics at OSS Ray Data dashboard are not displayed as expected. Because of this code change ray-project#55495, the following three metrics were added a filter for `operator`, which is [not supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448) by per node metrics, and causes empty result. ray_data_num_tasks_finished_per_node ray_data_bytes_outputs_of_finished_tasks_per_node ray_data_blocks_outputs_of_finished_tasks_per_node Signed-off-by: cong.qian <cong.qian@anyscale.com>
…pported operator filter (ray-project#57970) ## Description The per node metrics at OSS Ray Data dashboard are not displayed as expected. Because of this code change ray-project#55495, the following three metrics were added a filter for `operator`, which is [not supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448) by per node metrics, and causes empty result. ray_data_num_tasks_finished_per_node ray_data_bytes_outputs_of_finished_tasks_per_node ray_data_blocks_outputs_of_finished_tasks_per_node Signed-off-by: cong.qian <cong.qian@anyscale.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
…pported operator filter (ray-project#57970) ## Description The per node metrics at OSS Ray Data dashboard are not displayed as expected. Because of this code change ray-project#55495, the following three metrics were added a filter for `operator`, which is [not supported](https://github.com/ray-project/ray/blob/e51f8039bc6992d37834bcff109a3d340e78fcde/python/ray/data/_internal/stats.py#L448) by per node metrics, and causes empty result. ray_data_num_tasks_finished_per_node ray_data_bytes_outputs_of_finished_tasks_per_node ray_data_blocks_outputs_of_finished_tasks_per_node Signed-off-by: cong.qian <cong.qian@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>
Screen.Recording.2025-08-11.at.4.22.17.PM.mov
Why are these changes needed?
Related issue number
Checks
git commit -s) in this PR.scripts/format.shto lint the changes in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.