[tune][release] Upgrade tune_torch_benchmark to v2 #56804

liulehui · 2025-09-22T22:35:52Z

Why are these changes needed?

In Train V2, tune runs trials of train_driver_fn instead of Trainer instance in V1 into the Tuner, see latest doc
Pass in the TuneReportCallback for the trainer that used in Tune for reported results.
Reduced the number of runs / trials to make the test run faster
example run: https://buildkite.com/ray-project/release/builds/59578#019973ab-dee6-407c-bc0e-702ee9247ced

Related issue number

Checks

I've signed off every commit(by using the -s flag, i.e., git commit -s) in this PR.
I've run scripts/format.sh to lint the changes in this PR.
I've included any doc changes needed for https://docs.ray.io/en/master/.
- I've added any new APIs to the API Reference. For example, if I added a
  method in Tune, I've added it in doc/source/tune/api/ under the
  corresponding .rst file.
I've made sure the tests are passing. Note that there might be a few flaky tests, see the recent failures at https://flakey-tests.ray.io/
Testing Strategy
- Unit tests
- Release tests
- This PR is not tested :(

Signed-off-by: Lehui Liu <lehui@anyscale.com>

gemini-code-assist

Code Review

This pull request successfully upgrades the tune_torch_benchmark to use the Ray Train V2 API. The changes correctly adapt the benchmark to the new V2 patterns, such as using a train_driver_fn for tuning and leveraging TuneReportCallback. The refactoring to move train_loop to the module level improves code clarity. Additionally, the test configuration in release_tests.yaml has been updated to run faster and enable the V2 API, which is a sensible adjustment for release testing.

I have one minor suggestion to improve the robustness of the tune_torch function against a potential TypeError if it's called with a None config, as allowed by its signature.

release/air_tests/air_benchmarks/workloads/tune_torch_benchmark.py

Signed-off-by: Lehui Liu <lehui@anyscale.com>

justinvyu

Thanks!

release/air_tests/air_benchmarks/workloads/tune_torch_benchmark.py

Signed-off-by: Lehui Liu <lehui@anyscale.com>

release/air_tests/air_benchmarks/workloads/tune_torch_benchmark.py

1. Update to Train V2 Train+Tune integration API. 2. Pass in the TuneReportCallback for the trainer that used in Tune for reported results. 3. Reduced the number of runs / trials to make the test run faster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: elliot-barn <elliot.barnwell@anyscale.com>

1. Update to Train V2 Train+Tune integration API. 2. Pass in the TuneReportCallback for the trainer that used in Tune for reported results. 3. Reduced the number of runs / trials to make the test run faster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: Douglas Strodtman <douglas@anyscale.com>

1. Update to Train V2 Train+Tune integration API. 2. Pass in the TuneReportCallback for the trainer that used in Tune for reported results. 3. Reduced the number of runs / trials to make the test run faster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com>

1. Update to Train V2 Train+Tune integration API. 2. Pass in the TuneReportCallback for the trainer that used in Tune for reported results. 3. Reduced the number of runs / trials to make the test run faster --------- Signed-off-by: Lehui Liu <lehui@anyscale.com> Signed-off-by: Future-Outlier <eric901201@gmail.com>

liulehui added 2 commits September 22, 2025 15:29

upgrade tune_torch_benchmark to v2

5812e87

Signed-off-by: Lehui Liu <lehui@anyscale.com>

add the v2 env var

83ae59c

Signed-off-by: Lehui Liu <lehui@anyscale.com>

This comment was marked as outdated.

Sign in to view

gemini-code-assist bot reviewed Sep 22, 2025

View reviewed changes

release/air_tests/air_benchmarks/workloads/tune_torch_benchmark.py Outdated Show resolved Hide resolved

resolve typing error

a2964e0

Signed-off-by: Lehui Liu <lehui@anyscale.com>

This comment was marked as outdated.

Sign in to view

update wait for node to 4

146daf6

Signed-off-by: Lehui Liu <lehui@anyscale.com>

ray-gardener bot added tune Tune-related issues release-test release test labels Sep 23, 2025

liulehui requested a review from a team September 23, 2025 16:34

justinvyu approved these changes Sep 23, 2025

View reviewed changes

release/air_tests/air_benchmarks/workloads/tune_torch_benchmark.py Outdated Show resolved Hide resolved

resolve comment

6ed09fe

Signed-off-by: Lehui Liu <lehui@anyscale.com>

liulehui added the go add ONLY when ready to merge, run all tests label Sep 23, 2025

cursor bot reviewed Sep 23, 2025

View reviewed changes

release/air_tests/air_benchmarks/workloads/tune_torch_benchmark.py Show resolved Hide resolved

justinvyu merged commit ba9fbf9 into ray-project:master Sep 24, 2025
7 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[tune][release] Upgrade tune_torch_benchmark to v2 #56804

[tune][release] Upgrade tune_torch_benchmark to v2 #56804

Uh oh!

liulehui commented Sep 22, 2025 •

edited

Loading

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

justinvyu left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[tune][release] Upgrade tune_torch_benchmark to v2 #56804

[tune][release] Upgrade tune_torch_benchmark to v2 #56804

Uh oh!

Conversation

liulehui commented Sep 22, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are these changes needed?

Related issue number

Checks

Uh oh!

This comment was marked as outdated.

Uh oh!

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

Uh oh!

This comment was marked as outdated.

Uh oh!

justinvyu left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

liulehui commented Sep 22, 2025 •

edited

Loading