Clips actions to large limits before applying them to the environment #984

renezurbruegg · 2024-09-13T12:04:15Z

Description

Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.

Doing this, can lead to instability during training, since applying a large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.

This PR aims to fix this, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and - in my case - greatly improves training stability.

Type of change

New feature (non-breaking change which adds functionality)

TODO

Support multi dimensional action bounds.
Add tests

Checklist

I have run the pre-commit checks with ./isaaclab.sh --format
I have made corresponding changes to the documentation
My changes generate no new warnings
I have added tests that prove my fix is effective or that my feature works
I have updated the changelog and the corresponding version in the extension's config/extension.toml file
I have added my name to the CONTRIBUTORS.md or my name already exists there

…d_action_limits

renezurbruegg · 2024-09-13T12:05:51Z

Slightly related issue: #673

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env_cfg.py

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_env_cfg.py

Mayankm96

We should also change the gym.spaces range to these bounds?

Toni-SM · 2024-09-15T22:54:21Z

In my opinion, RL libraries should take care of this and no Isaac Lab.
Also, adding this config prevents defining other spaces since this is specific tied to Box.
Instead, I propose the following: #864 (comment), and with the statement that the RL libraries are in charge of generating a valid action for the task.

Mayankm96 · 2024-09-16T04:58:23Z

@Toni-SM I agree. We should move this to the environment wrappers (similar to what we do for RL-Games):

https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rl_games.py#L82-L94

Regarding, the action/obs space design for the environments, I think it is better to do that as its own separate thing. The current fix in this MR is at least critical for the continuous learning tasks as users otherwise get "NaNs" from the simulation due to the policy feedback loop (large action into observations that then lead to larger action predictions - which eventually cause the sim to go unstable). So I'd prefer if we don't block this fix itself.

jsmith-bdai · 2024-09-17T13:15:56Z

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env_cfg.py

@@ -127,3 +127,9 @@ class DirectRLEnvCfg:

    Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
    """
+
+    action_bounds: list[float] = [-100, 100]


Just curious where is [-100, 100] from? I wonder if it's best to leave this user-specified?

The 100 limits, comes from our internal codebase, still from legged gym.

I was considering having it None or Inf by default, but then users need to consciously set this value, and I think most people that have training stability issues will probably not think about that.

Could set it to None and add a FAQ to the docs?

I think we can set to -inf, inf

…_env_cfg.py Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Signed-off-by: renezurbruegg <zrene@ethz.ch>

…ased_env_cfg.py Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Signed-off-by: renezurbruegg <zrene@ethz.ch>

Mayankm96 · 2024-09-25T10:12:19Z

@renezurbruegg Would you be able to help move the changes to the wrappers?

…tion_limits

renezurbruegg · 2024-10-03T07:44:41Z

This will introduce "arbitrary" bounds of -100,100 to any new user that merges this PR, which could lead to unexpected behaviour.

How should this be addressed?

In my opinion there are three options:

Don't do anything, since the bounds of 100 are super large and the policies should not produce these actions anyway.
Change the default to -inf, inf, essentially keeping it the same as now but add a FAQ to the documentation referring to this issue.
Add a check if the limits have been active in any environment and print a warning to the terminal.

I personally prefer option (3).

Toni-SM · 2024-10-03T13:19:53Z

Hi @renezurbruegg

Please, note that current implementation is in conflict with #1117 for the direct workflow

renezurbruegg · 2024-10-03T13:38:44Z

Can these changes here directly be integrated in #1117 then?

Toni-SM · 2024-10-03T13:54:05Z

@renezurbruegg , as I commented previously, in my opinion the RL libraries should take care of this and no Isaac Lab.

For example, using skrl you can set model parameter clip_actions: True or define the model output as follows output: 100 * tanh(ACTIONS) in the agent config file skrl_ppo_cfg.yaml.

However, if the target library is not able to take care of that, the option number 3 (which will not prevent the training from throwing an exception after a certain time of execution) you mentioned, or the clipping of the action directly in the task implementation for critical cases, could be a solution.

renezurbruegg added 3 commits September 13, 2024 13:41

Add action clipping

cb8b360

Merge branch 'main' of github.com:renezurbruegg/IsaacLab into fix/har…

c79f7a8

…d_action_limits

Add hard clipping to all envs

6f41f2d

renezurbruegg requested review from kellyguo11, jsmith-bdai, Dhoeller19 and Mayankm96 as code owners September 13, 2024 12:04

Mayankm96 reviewed Sep 15, 2024

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl_env_cfg.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Sep 15, 2024

View reviewed changes

source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_based_env_cfg.py Outdated Show resolved Hide resolved

Mayankm96 reviewed Sep 15, 2024

View reviewed changes

jsmith-bdai reviewed Sep 17, 2024

View reviewed changes

renezurbruegg and others added 2 commits September 19, 2024 08:17

Update source/extensions/omni.isaac.lab/omni/isaac/lab/envs/direct_rl…

43878cb

…_env_cfg.py Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Signed-off-by: renezurbruegg <zrene@ethz.ch>

Update source/extensions/omni.isaac.lab/omni/isaac/lab/envs/manager_b…

912aaef

…ased_env_cfg.py Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com> Signed-off-by: renezurbruegg <zrene@ethz.ch>

Mayankm96 changed the title ~~Clip actions to large hard limits before applying them to the environment~~ Clips actions to large limits before applying them to the environment Sep 19, 2024

Mayankm96 mentioned this pull request Sep 23, 2024

[Question]My Action becomes all nan during the training.!!! #1024

Closed

renezurbruegg added 2 commits October 3, 2024 09:38

Update gym spaces

00c4fda

Merge branch 'main' of github.com:isaac-sim/IsaacLab into fix/hard_ac…

f215bac

…tion_limits

renezurbruegg self-assigned this Oct 3, 2024

Merge branch 'main' into fix/hard_action_limits

5ce2bd5

Dhoeller19 unassigned renezurbruegg Oct 31, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Clips actions to large limits before applying them to the environment #984

Clips actions to large limits before applying them to the environment #984

renezurbruegg commented Sep 13, 2024 •

edited

Loading

renezurbruegg commented Sep 13, 2024 •

edited

Loading

Mayankm96 left a comment

Toni-SM commented Sep 15, 2024

Mayankm96 commented Sep 16, 2024 •

edited

Loading

jsmith-bdai Sep 17, 2024

renezurbruegg Sep 19, 2024

Dhoeller19 Oct 11, 2024

Mayankm96 commented Sep 25, 2024

renezurbruegg commented Oct 3, 2024

Toni-SM commented Oct 3, 2024

renezurbruegg commented Oct 3, 2024

Toni-SM commented Oct 3, 2024 •

edited

Loading

Clips actions to large limits before applying them to the environment #984

Are you sure you want to change the base?

Clips actions to large limits before applying them to the environment #984

Conversation

renezurbruegg commented Sep 13, 2024 • edited Loading

Description

Type of change

TODO

Checklist

renezurbruegg commented Sep 13, 2024 • edited Loading

Mayankm96 left a comment

Choose a reason for hiding this comment

Toni-SM commented Sep 15, 2024

Mayankm96 commented Sep 16, 2024 • edited Loading

jsmith-bdai Sep 17, 2024

Choose a reason for hiding this comment

renezurbruegg Sep 19, 2024

Choose a reason for hiding this comment

Dhoeller19 Oct 11, 2024

Choose a reason for hiding this comment

Mayankm96 commented Sep 25, 2024

renezurbruegg commented Oct 3, 2024

Toni-SM commented Oct 3, 2024

renezurbruegg commented Oct 3, 2024

Toni-SM commented Oct 3, 2024 • edited Loading

renezurbruegg commented Sep 13, 2024 •

edited

Loading

renezurbruegg commented Sep 13, 2024 •

edited

Loading

Mayankm96 commented Sep 16, 2024 •

edited

Loading

Toni-SM commented Oct 3, 2024 •

edited

Loading