Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Clips actions to large limits before applying them to the environment #984

Open
wants to merge 8 commits into
base: main
Choose a base branch
from

Conversation

renezurbruegg
Copy link
Collaborator

@renezurbruegg renezurbruegg commented Sep 13, 2024

Description

Currently, the actions from the policy are directly applied to the environment and also often fed back to the policy using the last action as observation.

Doing this, can lead to instability during training, since applying a large action can introduce a negative feedback loop.
More specifically, applying a very large action leads to a large last_action observations, which often results in a large error in the critic, which can lead to even larger actions being sampled in the future.

This PR aims to fix this, by clipping the actions to (large) hard limits before applying them to the environment. This prohibits the actions from growing continuously and - in my case - greatly improves training stability.

Type of change

  • New feature (non-breaking change which adds functionality)

TODO

  • Support multi dimensional action bounds.
  • Add tests

Checklist

  • I have run the pre-commit checks with ./isaaclab.sh --format
  • I have made corresponding changes to the documentation
  • My changes generate no new warnings
  • I have added tests that prove my fix is effective or that my feature works
  • I have updated the changelog and the corresponding version in the extension's config/extension.toml file
  • I have added my name to the CONTRIBUTORS.md or my name already exists there

@renezurbruegg
Copy link
Collaborator Author

renezurbruegg commented Sep 13, 2024

Slightly related issue: #673

Copy link
Contributor

@Mayankm96 Mayankm96 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should also change the gym.spaces range to these bounds?

@Toni-SM
Copy link
Contributor

Toni-SM commented Sep 15, 2024

In my opinion, RL libraries should take care of this and no Isaac Lab.
Also, adding this config prevents defining other spaces since this is specific tied to Box.
Instead, I propose the following: #864 (comment), and with the statement that the RL libraries are in charge of generating a valid action for the task.

@Mayankm96
Copy link
Contributor

Mayankm96 commented Sep 16, 2024

@Toni-SM I agree. We should move this to the environment wrappers (similar to what we do for RL-Games):

https://github.com/isaac-sim/IsaacLab/blob/main/source/extensions/omni.isaac.lab_tasks/omni/isaac/lab_tasks/utils/wrappers/rl_games.py#L82-L94

Regarding, the action/obs space design for the environments, I think it is better to do that as its own separate thing. The current fix in this MR is at least critical for the continuous learning tasks as users otherwise get "NaNs" from the simulation due to the policy feedback loop (large action into observations that then lead to larger action predictions - which eventually cause the sim to go unstable). So I'd prefer if we don't block this fix itself.

@@ -127,3 +127,9 @@ class DirectRLEnvCfg:

Please refer to the :class:`omni.isaac.lab.utils.noise.NoiseModel` class for more details.
"""

action_bounds: list[float] = [-100, 100]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just curious where is [-100, 100] from? I wonder if it's best to leave this user-specified?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The 100 limits, comes from our internal codebase, still from legged gym.

I was considering having it None or Inf by default, but then users need to consciously set this value, and I think most people that have training stability issues will probably not think about that.

Could set it to None and add a FAQ to the docs?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can set to -inf, inf

renezurbruegg and others added 2 commits September 19, 2024 08:17
…_env_cfg.py

Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: renezurbruegg <zrene@ethz.ch>
…ased_env_cfg.py

Co-authored-by: Mayank Mittal <12863862+Mayankm96@users.noreply.github.com>
Signed-off-by: renezurbruegg <zrene@ethz.ch>
@Mayankm96 Mayankm96 changed the title Clip actions to large hard limits before applying them to the environment Clips actions to large limits before applying them to the environment Sep 19, 2024
@Mayankm96
Copy link
Contributor

@renezurbruegg Would you be able to help move the changes to the wrappers?

@renezurbruegg
Copy link
Collaborator Author

This will introduce "arbitrary" bounds of -100,100 to any new user that merges this PR, which could lead to unexpected behaviour.

How should this be addressed?

In my opinion there are three options:

  1. Don't do anything, since the bounds of 100 are super large and the policies should not produce these actions anyway.
  2. Change the default to -inf, inf, essentially keeping it the same as now but add a FAQ to the documentation referring to this issue.
  3. Add a check if the limits have been active in any environment and print a warning to the terminal.

I personally prefer option (3).

@renezurbruegg renezurbruegg self-assigned this Oct 3, 2024
@Toni-SM
Copy link
Contributor

Toni-SM commented Oct 3, 2024

Hi @renezurbruegg

Please, note that current implementation is in conflict with #1117 for the direct workflow

@renezurbruegg
Copy link
Collaborator Author

Can these changes here directly be integrated in #1117 then?

@Toni-SM
Copy link
Contributor

Toni-SM commented Oct 3, 2024

@renezurbruegg , as I commented previously, in my opinion the RL libraries should take care of this and no Isaac Lab.

For example, using skrl you can set model parameter clip_actions: True or define the model output as follows output: 100 * tanh(ACTIONS) in the agent config file skrl_ppo_cfg.yaml.

However, if the target library is not able to take care of that, the option number 3 (which will not prevent the training from throwing an exception after a certain time of execution) you mentioned, or the clipping of the action directly in the task implementation for critical cases, could be a solution.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants