-
Notifications
You must be signed in to change notification settings - Fork 5.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] CPU/memory usage per func class #31234
[WIP] CPU/memory usage per func class #31234
Conversation
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
curr_proc = psutil.Process() | ||
# Here, parent is always raylet because the | ||
# dashboard agent is a child of the raylet process. | ||
self._raylet_proc = curr_proc.parent() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is necessary because psutil cpu_percent will return the CPU percent from the last time the method is called. The first invocation is always is 0. So we should cache the Process(). Note that the raylet pid won't change unless it is dead (which is unrecoverable).
for proc_name, stats in proc_name_to_stats.items(): | ||
records.extend(self._generate_system_stats_record(stats, proc_name)) | ||
|
||
# Reset worker metrics that are from finished processes. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this is unnecessary otherwise the finished tasks' metrics are not going to be reset
Signed-off-by: SangBin Cho <rkooo567@gmail.com>
)" (ray-project#31328)" This reverts commit a0c894f.
Signed-off-by: SangBin Cho <rkooo567@gmail.com> This PR replaces the "Component" label for memory / CPU usage to the proc name of worker processes. E.g., instead of "workers", we use "ray::IDLE" or "ray::Actor". This PR also adds the per component CPU usage. I found the original code has a bug, so I fixed it (when cpu_percent is first called from psutil.Process, it is always 0, and then whenever we calculate the CPU usage, we create a new psutil.Process. So it is always 0.) Note that most of code change is from the tests (the actual code for the feature is around 100~150 lines).
#31454) …28)" This reverts commit a0c894f. <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :(
)" (ray-project#313… (ray-project#31454) …28)" This reverts commit a0c894f. <!-- Thank you for your contribution! Please review https://github.com/ray-project/ray/blob/master/CONTRIBUTING.rst before opening a pull request. --> <!-- Please add a reviewer to the assignee section when you create a PR. If you don't have the access to it, we will shortly find a reviewer and assign them to your PR. --> ## Why are these changes needed? <!-- Please give a short summary of the change and the problem this solves. --> ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [ ] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [ ] I've run `scripts/format.sh` to lint the changes in this PR. - [ ] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [ ] Unit tests - [ ] Release tests - [ ] This PR is not tested :( Signed-off-by: Andrea Pisoni <andreapiso@gmail.com>
Signed-off-by: SangBin Cho <rkooo567@gmail.com> This PR replaces the "Component" label for memory / CPU usage to the proc name of worker processes. E.g., instead of "workers", we use "ray::IDLE" or "ray::Actor". This PR also adds the per component CPU usage. I found the original code has a bug, so I fixed it (when cpu_percent is first called from psutil.Process, it is always 0, and then whenever we calculate the CPU usage, we create a new psutil.Process. So it is always 0.) Note that most of code change is from the tests (the actual code for the feature is around 100~150 lines). Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
…ay-project#31328) Reverts ray-project#31234 Fix `test_runtime_env_2` and `test_metrics_agent` Windows CI failures Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>
Why are these changes needed?
This PR replaces the "Component" label for memory / CPU usage to the proc name of worker processes. E.g., instead of "workers", we use "ray::IDLE" or "ray::Actor".
This PR also adds the per component CPU usage. I found the original code has a bug, so I fixed it (when cpu_percent is first called from psutil.Process, it is always 0, and then whenever we calculate the CPU usage, we create a new psutil.Process. So it is always 0.)
Note that most of code change is from the tests (the actual code for the feature is around 100~150 lines).
Related issue number
Checks
git commit -s
) in this PR.scripts/format.sh
to lint the changes in this PR.