[Train/CI] Fix flaky `test_reserved_cpu_warnings` #31415

Yard1 · 2023-01-03T21:37:51Z

Signed-off-by: Antoni Baum antoni.baum@protonmail.com

Why are these changes needed?

The issue seems to have been caused by Ray tasks / actors being sometimes kept alive between fit calls before garbage collection kicks in to kill them. This caused the ray.available_resources() call in TunerInternal._maybe_warn_resource_contention to return less CPUs available than expected by the test. This has been fixed by explicitly calling gc.collect() between fit calls.

Related issue number

Closes #31334

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

The issue seems to have been caused by Ray tasks / actors being sometimes kept alive between fit calls before garbage collection kicks in to kill them. This caused the ray.available_resources() call in TunerInternal._maybe_warn_resource_contention to return less CPUs available than expected by the test. This has been fixed by explicitly calling gc.collect() between fit calls. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

The issue seems to have been caused by Ray tasks / actors being sometimes kept alive between fit calls before garbage collection kicks in to kill them. This caused the ray.available_resources() call in TunerInternal._maybe_warn_resource_contention to return less CPUs available than expected by the test. This has been fixed by explicitly calling gc.collect() between fit calls. Signed-off-by: Antoni Baum <antoni.baum@protonmail.com> Signed-off-by: tmynn <hovhannes.tamoyan@gmail.com>

[Train] Fix flaky test_reserved_cpu_warnings

4303db2

Signed-off-by: Antoni Baum <antoni.baum@protonmail.com>

Yard1 requested review from amogkam and bveeramani January 3, 2023 21:37

Yard1 assigned amogkam and bveeramani Jan 3, 2023

amogkam approved these changes Jan 3, 2023

View reviewed changes

amogkam merged commit 23ad58c into ray-project:master Jan 4, 2023

Yard1 deleted the fix_flaky_test_reserved_cpus_warnings branch January 4, 2023 18:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Train/CI] Fix flaky `test_reserved_cpu_warnings` #31415

[Train/CI] Fix flaky `test_reserved_cpu_warnings` #31415

Yard1 commented Jan 3, 2023 •

edited

Loading

[Train/CI] Fix flaky test_reserved_cpu_warnings #31415

[Train/CI] Fix flaky test_reserved_cpu_warnings #31415

Conversation

Yard1 commented Jan 3, 2023 • edited Loading

Why are these changes needed?

Related issue number

Checks

[Train/CI] Fix flaky `test_reserved_cpu_warnings` #31415

[Train/CI] Fix flaky `test_reserved_cpu_warnings` #31415

Yard1 commented Jan 3, 2023 •

edited

Loading