RLops Guide #296

vwxyzjn · 2022-10-19T18:41:01Z

Our current contribution guide mainly covers the process of contributing new algorithms. However, it is unclear what the process looks like for contributing to existing algorithms, which require a different set of procedures.

Problem

DRL is brittle and has a series of reproducibility issues — even bug fixes sometimes could introduce performance regression (e.g., see how a bug fix of contact force in MuJoCo results in worse performance for PPO). Therefore, it is essential to understand how the proposed changes impact the performance of the algorithms. At large, we wish to distinguish two types of contributions: 1) non-performance-impacting changes and 2) performance-impacting changes.

non-performance-impacting changes: this type of change does not impact the performance of the algorithm, such as documentation fixes (Fix typos #282), renaming variables (Refactor dqn word choice #257), and removing unused code (Remove the unnecessary regular advantage code in PPO #287). For this type of change, we can easily merge them without worrying too much about the consequences.
performance-impacting changes: this type of change impacts the algorithm's performance. Examples include making a slight modification to the gamma parameter in PPO (added gamma to reward normalization wrappers #209), properly handling action bounds in DDPG (Td3 ddpg action bound fix #211), and fixing bugs (TD3: fixed dimension of clipped_noise for target actions, added noise … #281)

Importantly, regardless of the slight difference in performance-impacting changes, we need to re-run the benchmark to ensure there is no regression. This post proposes a way for us to re-run the model and check regression seamlessly.

Proposal

We should add a tag for every benchmark run to distinguish the version of CleanRL used to run the experiments. This can be done by

WANDB_TAGS=$(git describe --tags) OMP_NUM_THREADS=1 xvfb-run -a python -m cleanrl_utils.benchmark \
    --env-ids HalfCheetah-v2 Walker2d-v2 Hopper-v2 InvertedPendulum-v2 Humanoid-v2 Pusher-v2 \
    --command "poetry run python cleanrl/td3_continuous_action.py --track --capture-video" \
    --num-seeds 3 \
    --workers 1

This gives us a tag in the tracked experiments, as shown below:

Then we can design APIs to compare results from different tags / versions of the algorithm. Something like

import cleanrl_utils.compare
compare(
    ["HalfCheetah-v2", ],
    filters1={"exp_name": "td3_continuous_action", "tag": "v1.0.0b2-7-g4bb6766"},
    filters1={"exp_name": "td3_continuous_action", "tag": "v1.0.0b2-7-gxfd3d3"},
)

which could generate wandb reports with the following figure and corresponding tables.

If the newer tag version v1.0.0b2-7-g4bb6766 works without causing major regression, we can then label it as latest (and remove the tag latest for v1.0.0b2-7-gxfd3d3 correspondingly.

In the future, this will allow us to compare two completely different versions, too, like v1.0.0b2-7-g4bb6766 vs v1.5.0

CC @dosssman @yooceii @dipamc @kinalmehta @joaogui1 @araffin @bragajj @cool-RR @jkterry1 for thoughts

The text was updated successfully, but these errors were encountered:

vwxyzjn · 2023-03-29T13:26:48Z

Closed by #368

vwxyzjn changed the title ~~Contribution guide on performance-impacting changes~~ RLops Guide Oct 20, 2022

This was referenced Oct 31, 2022

Prototype RLops Utility #307

Closed

Auto wandb tag with benchmark.py #308

Merged

vwxyzjn mentioned this issue Mar 29, 2023

Better contribution guide #368

Merged

18 tasks

vwxyzjn closed this as completed Mar 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

RLops Guide #296

RLops Guide #296

vwxyzjn commented Oct 19, 2022 •

edited

Loading

vwxyzjn commented Mar 29, 2023

RLops Guide #296

RLops Guide #296

Comments

vwxyzjn commented Oct 19, 2022 • edited Loading

Problem

Proposal

vwxyzjn commented Mar 29, 2023

vwxyzjn commented Oct 19, 2022 •

edited

Loading