-
Notifications
You must be signed in to change notification settings - Fork 7k
[RLlib] [fix] [metrics] avoid biasing the throughput #57215
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[RLlib] [fix] [metrics] avoid biasing the throughput #57215
Conversation
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
kamil-kaczmarek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Great catch! Some Small fixes needed.
Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
… there. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
kamil-kaczmarek
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hey! Added few more comments to make sure that we report metrics consistently for both learner and differentiable_learner.
Thanks for being patient with this PR!
…tialLearner' and fixed some linting. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
7c42aaa to
ddcaeab
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just one last edit and we're good to go 🚀
…bleLearner'. Signed-off-by: simonsays1980 <simon.zehnder@gmail.com>
## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <simon.zehnder@gmail.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>
## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <simon.zehnder@gmail.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com> Signed-off-by: xgui <xgui@anyscale.com>
## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number <!-- For example: "Closes #1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <simon.zehnder@gmail.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>
## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <simon.zehnder@gmail.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com>
## Why are these changes needed? Computing the `num_module_steps_trained_(lifetime)_throughput` metrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias. ## Related issue number <!-- For example: "Closes ray-project#1234" --> ## Checks - [x] I've signed off every commit(by using the -s flag, i.e., `git commit -s`) in this PR. - [x] I've run pre-commit jobs to lint the changes in this PR. ([pre-commit setup](https://docs.ray.io/en/latest/ray-contribute/getting-involved.html#lint-and-formatting)) - [x] I've included any doc changes needed for https://docs.ray.io/en/master/. - [ ] I've added any new APIs to the API Reference. For example, if I added a method in Tune, I've added it in `doc/source/tune/api/` under the corresponding `.rst` file. - [x] I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/ - Testing Strategy - [x] Unit tests - [ ] Release tests - [ ] This PR is not tested :( --------- Signed-off-by: simonsays1980 <simon.zehnder@gmail.com> Co-authored-by: Kamil Kaczmarek <kaczmarek.poczta@gmail.com> Signed-off-by: Aydin Abiar <aydin@anyscale.com>
Signed-off-by: Daniel Sperber <github.blurry@9ox.net>
Why are these changes needed?
Computing the
num_module_steps_trained_(lifetime)_throughputmetrics are biased due to the way how we record throughput times in a loop over module batches. This PR offers a fix to this bias.Related issue number
Checks
git commit -s) in this PR.method in Tune, I've added it in
doc/source/tune/api/under thecorresponding
.rstfile.