Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] CPU/memory usage per func class #31234

Merged
merged 5 commits into from
Dec 23, 2022

Conversation

rkooo567
Copy link
Contributor

@rkooo567 rkooo567 commented Dec 20, 2022

Why are these changes needed?

This PR replaces the "Component" label for memory / CPU usage to the proc name of worker processes. E.g., instead of "workers", we use "ray::IDLE" or "ray::Actor".

This PR also adds the per component CPU usage. I found the original code has a bug, so I fixed it (when cpu_percent is first called from psutil.Process, it is always 0, and then whenever we calculate the CPU usage, we create a new psutil.Process. So it is always 0.)

Note that most of code change is from the tests (the actual code for the feature is around 100~150 lines).

Related issue number

Checks

  • I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
  • I've run scripts/format.sh to lint the changes in this PR.
  • I've included any doc changes needed for https://docs.ray.io/en/master/.
  • I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
  • Testing Strategy
    • Unit tests
    • Release tests
    • This PR is not tested :(

Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
curr_proc = psutil.Process()
# Here, parent is always raylet because the
# dashboard agent is a child of the raylet process.
self._raylet_proc = curr_proc.parent()
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is necessary because psutil cpu_percent will return the CPU percent from the last time the method is called. The first invocation is always is 0. So we should cache the Process(). Note that the raylet pid won't change unless it is dead (which is unrecoverable).

for proc_name, stats in proc_name_to_stats.items():
records.extend(self._generate_system_stats_record(stats, proc_name))

# Reset worker metrics that are from finished processes.
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this is unnecessary otherwise the finished tasks' metrics are not going to be reset

Signed-off-by: SangBin Cho <rkooo567@gmail.com>
@ericl ericl added the @author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer. label Dec 23, 2022
@rkooo567 rkooo567 merged commit aeb8629 into ray-project:master Dec 23, 2022
krfricke added a commit that referenced this pull request Dec 27, 2022
architkulkarni pushed a commit that referenced this pull request Dec 27, 2022
Reverts #31234

Fix `test_runtime_env_2` and `test_metrics_agent` Windows CI failures
rkooo567 added a commit to rkooo567/ray that referenced this pull request Jan 5, 2023
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
Signed-off-by: SangBin Cho <rkooo567@gmail.com>

This PR replaces the "Component" label for memory / CPU usage to the proc name of worker processes. E.g., instead of "workers", we use "ray::IDLE" or "ray::Actor".

This PR also adds the per component CPU usage. I found the original code has a bug, so I fixed it (when cpu_percent is first called from psutil.Process, it is always 0, and then whenever we calculate the CPU usage, we create a new psutil.Process. So it is always 0.)

Note that most of code change is from the tests (the actual code for the feature is around 100~150 lines).
AmeerHajAli pushed a commit that referenced this pull request Jan 12, 2023
Reverts #31234

Fix `test_runtime_env_2` and `test_metrics_agent` Windows CI failures
rkooo567 added a commit that referenced this pull request Jan 16, 2023
#31454)

…28)"

This reverts commit a0c894f.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes #1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(
andreapiso pushed a commit to andreapiso/ray that referenced this pull request Jan 22, 2023
)" (ray-project#313… (ray-project#31454)

…28)"

This reverts commit a0c894f.

<!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. -->

<!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. -->

## Why are these changes needed?

<!-- Please give a short summary of the change and the problem this solves. -->

## Related issue number

<!-- For example: "Closes ray-project#1234" -->

## Checks

- [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR.
- [ ] I've run `scripts/format.sh` to lint the changes in this PR.
- [ ] I've included any doc changes needed for https://docs.ray.io/en/master/.
- [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
- Testing Strategy
   - [ ] Unit tests
   - [ ] Release tests
   - [ ] This PR is not tested :(

Signed-off-by: Andrea Pisoni <andreapiso@gmail.com>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
Signed-off-by: SangBin Cho <rkooo567@gmail.com>

This PR replaces the "Component" label for memory / CPU usage to the proc name of worker processes. E.g., instead of "workers", we use "ray::IDLE" or "ray::Actor".

This PR also adds the per component CPU usage. I found the original code has a bug, so I fixed it (when cpu_percent is first called from psutil.Process, it is always 0, and then whenever we calculate the CPU usage, we create a new psutil.Process. So it is always 0.)

Note that most of code change is from the tests (the actual code for the feature is around 100~150 lines).

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
tamohannes pushed a commit to ju2ez/ray that referenced this pull request Jan 25, 2023
…ay-project#31328)

Reverts ray-project#31234

Fix `test_runtime_env_2` and `test_metrics_agent` Windows CI failures

Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
@author-action-required The PR author is responsible for the next step. Remove tag to send back to the reviewer.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants