[Minor] Fix runnability of RLHF example in examples/rlhf #1753

ianbarber · 2023-12-19T07:56:14Z

Description

Small amount of bit rot on the RLHF example.

Correct passing of configuration in train transformer and reward steps
Fix typo on prompt logger
Update log_scalar to match class spec.

Motivation and Context

There was one larger issue which I didn't address, which was that tensordictmodule wasn't playing well with dynamo, due to the optimizedmodule not forwarding getattr calls to the underlying module: I didn't get round to verifying that was fixed in nightlies (though I think it might be), but I did think about about updating the train/train_reward configs to default compile to be off as it will fail in the stable release version with TypeError: _forward_unimplemented() got an unexpected keyword argument 'input_ids' or similar. Its easy enough to disable compile in config or on command line, but it might be worth disabling in the base config as well.

Types of changes

What types of changes does your code introduce? Remove all that do not apply:

[ x] Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds core functionality)
Breaking change (fix or feature that would cause existing functionality to change)
Documentation (update in the documentation)
Example (update in the folder of examples)

Checklist

Go over all the following points, and put an x in all the boxes that apply.
If you are unsure about any of these, don't hesitate to ask. We are here to help!

[ x] I have read the CONTRIBUTION guide (required)
My change requires a change to the documentation.
I have updated the tests accordingly (required for a bug fix or a new feature).
I have updated the documentation accordingly.

- Correct passing of configuration in train transformer and reward steps - Fix typo on prompt logger - Update log_scalar to match class spec.

pytorch-bot · 2023-12-19T07:56:18Z

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1753

📄 Preview Python docs built from this PR

Note: Links to docs will display an error until the docs builds have been completed.

✅ You can merge normally! (10 Unrelated Failures)

As of commit 097db8a with merge base 0e02132 ():

FLAKY - The following jobs failed but were likely due to flakiness present on trunk:

Build M1 Wheels / pytorch/rl / wheel-py3_8-cpu (gh)
ImportError: cannot import name 'MemoryMappedTensor' from 'tensordict' (/Users/ec2-user/runner/_work/_temp/conda_environment_7258916164/lib/python3.8/site-packages/tensordict/__init__.py)
Continuous Benchmark (PR) / CPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Continuous Benchmark (PR) / GPU Pytest benchmark (gh)
Workflow failed! Resource not accessible by integration
Examples Tests on Linux / tests (3.9, 12.1) / linux-job (gh)
RuntimeError: Command docker exec -t d1d4b7d98f53bfa7e4a4e53b130ee1f049f7861482fe60351d8bd82b1e8342f4 /exec failed with exit code 4
Lint / python-source-and-configs / linux-job (gh)
examples/rlhf/train_reward.py:65:5: F841 local variable 'dtype' is assigned to but never used
Unit-tests on Linux GPU, latest stable release / tests (3.8, 11.8) / linux-job (gh)
test/test_libs.py::TestGym::test_vecenvs_env[CartPole-v1]
Unit-tests on Windows CPU / unittests / windows-job (gh)
The process 'C:\Program Files\Git\cmd\git.exe' failed with exit code 128

BROKEN TRUNK - The following jobs failed but were present on the merge base:

👉 Rebase onto the `viable/strict` branch to avoid these failures

Habitat Tests on Linux / tests (3.9, 11.6) / linux-job (gh)
test/test_libs.py::TestHabitat::test_habitat_render[False-HabitatPick-v0]
RLHF Tests on Linux / unittests (3.9, 12.1) / linux-job (gh)
test/test_rlhf.py::TestRollout::test_rollout_from_data
Unit-tests on Windows GPU / unittests / windows-job (gh)
Process completed with exit code 1.

This comment was automatically generated by Dr. CI and updates every 15 minutes.

vmoens · 2023-12-19T08:03:13Z

which was that tensordictmodule wasn't playing well with dynamo, due to the optimizedmodule not forwarding getattr calls to the underlying module

Yep! tensordict (and related modules) compatibility with dynamo and the rest of the compile stack is a big TODO. Happy to chap about what the roadmap for this looks like offline.

vmoens

LGTM
Our examples CI should have caught these...

Minor fixes for runnability

097db8a

- Correct passing of configuration in train transformer and reward steps - Fix typo on prompt logger - Update log_scalar to match class spec.

facebook-github-bot added the CLA Signed This label is managed by the Facebook bot. Authors need to sign the CLA before a PR can be reviewed. label Dec 19, 2023

vmoens approved these changes Dec 19, 2023

View reviewed changes

vmoens added the bug Something isn't working label Dec 19, 2023

vmoens merged commit 2e1d60c into pytorch:main Dec 19, 2023
53 of 63 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Minor] Fix runnability of RLHF example in examples/rlhf #1753

[Minor] Fix runnability of RLHF example in examples/rlhf #1753

ianbarber commented Dec 19, 2023

pytorch-bot bot commented Dec 19, 2023 •

edited

Loading

vmoens commented Dec 19, 2023

vmoens left a comment

[Minor] Fix runnability of RLHF example in examples/rlhf #1753

[Minor] Fix runnability of RLHF example in examples/rlhf #1753

Conversation

ianbarber commented Dec 19, 2023

Description

Motivation and Context

Types of changes

Checklist

pytorch-bot bot commented Dec 19, 2023 • edited Loading

🔗 Helpful Links

🧪 See artifacts and rendered test results at hud.pytorch.org/pr/pytorch/rl/1753

✅ You can merge normally! (10 Unrelated Failures)

vmoens commented Dec 19, 2023

vmoens left a comment

Choose a reason for hiding this comment

pytorch-bot bot commented Dec 19, 2023 •

edited

Loading