Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train High-Level Policies in Hierarchical Approaches #1053

Merged
merged 55 commits into from
Feb 10, 2023
Merged

Conversation

ASzot
Copy link
Contributor

@ASzot ASzot commented Dec 28, 2022

Motivation and Context

  • Adds trainable HL policies.
  • Refactors the tests for HRL related approaches.
  • Registry for storage and updaters.
  • Added noop skill.

How Has This Been Tested

New HRL tests.

To run fixed HL and oracle LL skills in evaluation mode: python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml --run-type eval habitat_baselines/rl/policy=hl_fixed habitat_baselines/rl/policy/hierarchical_policy/defined_skills=oracle_skills
To run learned HL and oracle LL skills in train mode: python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml --run-type train habitat_baselines/rl/policy=hl_neural habitat_baselines/rl/policy/hierarchical_policy/defined_skills=oracle_skills

With oracle skills, you should run in kinematic simulation mode. See the note at the top of habitat_baselines/config/habitat_baselines/rl/policy/hierarchical_policy/defined_skills/oracle_skills.yaml for how to add kinematic mode.

To run only over the minival dataset. Add habitat_baselines.eval.split=minival habitat.dataset.split=minival.

Types of changes

  • Docs change / refactoring / dependency upgrade
  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)

Checklist

  • My code follows the code style of this project.
  • My change requires a change to the documentation.
  • I have updated the documentation accordingly.
  • I have read the CONTRIBUTING document.
  • I have completed my CLA (see CONTRIBUTING)
  • I have added tests to cover my changes.
  • All new and existing tests passed.

@facebook-github-bot facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Dec 28, 2022
@@ -136,5 +136,21 @@ def register_auxiliary_loss(
def get_auxiliary_loss(cls, name: str):
return cls._get_impl("aux_loss", name)

@classmethod
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

storage and updater are not very descriptive. Can you add a little text here to clarify what is the base classes that are being registered?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe even add a

assert isinstance(to_register, RolloutStorage)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is the assert ?

test/test_baseline_training.py Outdated Show resolved Hide resolved
akshararai and others added 3 commits January 6, 2023 15:40
* separate config policy

* udpate config

* renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav

* renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav

* order of parameters in config to avoid overwite

* added documentation on how to run the HRL policy, removed auto_name, not needed in this config

* remove empty file

* fixed the yaml linked in the readme

* updated the description of the config fike

* Delete tp_noop_onav.yaml

don't need this file anymore, added readme instructions to do this by changing hydra parameters.

* setting the agent back to depth_head_agent as _vis slows down the training. Apologies

* removed one config file, since it seemed like mostly a repeat of the tp config

* cleaned up config files by removing reduntant parameters

* renamed monolithic_pointnav to be monolithic

* fixed the comment

* added localization_sensor to all task lab_sensors

Co-authored-by: Xavier Puig <xavierpuig@fb.com>
Fixed link for PDDL
xavierpuigf and others added 5 commits January 13, 2023 09:03
* make should_terminate consistently on cpu

* Corrected config files for skills using noop + bug in pddl_apply_action

* bug fixes in apply postcond

* Update hierarchical_policy.py
Comment on lines 305 to 307
elif self._skills[skill_id].num_recurrent_layers != 0:
raise ValueError(
f"The code does not currently support neural LL and neural HL skills. Skill={self._skills[skill_id]}, HL={self._high_level_policy}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hard to understand why the skill having non zero number of recurrent layers means that the user is trying to use neural HL and neural LL. Add a comment of give an property to skills with a descriptive name that checks if the LL is neural.
Also, aren't we trying to have frozen NN LL skills and training NN HL policy ? This error seems to indicate this is not supported.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct, that combination is currently not supported. I refactored with a more descriptive property.

Copy link
Contributor

@vincentpierre vincentpierre left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove : habitat-baselines/tb/events.out.tfevents.1674193555.s-164-67-210-79.resnet.ucla.edu.58058.0

habitat-baselines/habitat_baselines/README.md Outdated Show resolved Hide resolved
habitat-baselines/habitat_baselines/README.md Outdated Show resolved Hide resolved
@vincentpierre
Copy link
Contributor

It seems running

 python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_rearrange.yaml --run-type train habitat_baselines.num_environments=4

Does not work on this branch, but does on main.

obs_skill_inputs: ["obj_start_sensor"]
max_skill_steps: 1
apply_postconds: True
force_end_on_timeout: False
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshararai check if all these parameters are needed, and add a sample documentation for them somewhere.

name: "NeuralHighLevelPolicy"
allow_other_place: False
hidden_dim: 512
use_rnn: True
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshararai check which of these fields are needed

EPS_PPO = 1e-5


@baseline_registry.register_storage
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hm I prefer Vince's style better, as in theory, it allows for less nested inheritance. But I think this can be part of a future Quality-of-life PR.
Imagine a new type of rollout storage NewRolloutStorage. Should that inherit from HrlRolloutStorage? Then the nested inheritance becomes newRolloutStorage -> HrlRolloutStorage -> RolloutStorage. Instead if we have a BaseRolloutStorage, then all the classes can inherit from the base, avoiding this multi-layer nested inheritance.

habitat-baselines/habitat_baselines/rl/hrl/skills/reset.py Outdated Show resolved Hide resolved
trainer_name: "ddppo"
torch_gpu_id: 0
video_fps: 30
eval_ckpt_path_dir: ""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@akshararai check for redundant variables


defaults:
- rl_hierarchical
- /habitat/task/actions:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is the oracle navigation action, and why is it different from the skill?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The oracle navigation action is a Habitat-Lab action that accesses the underlying simulator state to compute the oracle path. The oracle navigation skill does not have access to the simulator because it is on the Habitat-Baselines side.

habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py Outdated Show resolved Hide resolved
@@ -191,6 +196,8 @@ def get_next_skill(
if should_plan != 1.0:
continue
use_ac = self._all_actions[skill_sel[batch_idx]]
if baselines_logger.level >= logging.DEBUG:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you need this? I thought .debug already checks if the level is correct.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, this is needed because I want to avoid any cost of formatting the use_ac value.

habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py Outdated Show resolved Hide resolved
Comment on lines 382 to +384
trainer_name: str = "ppo"
updater_name: str = "PPO"
distrib_updater_name: str = "DDPPO"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks super redundant.
Some of these are capitalized.
Neither distrib_updater_name nor updater_name are ever changed in any of the configurations.
Can distrib_updater_name and updater_name be properties of the trainer ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No, I think they are best here because they are needed to instantiate the updater. But this was a mistake, rl_hierarchical was supposed to change these properties.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can't these just be inferred from other configurations ?

Comment on lines +10 to +13
__all__ = [
"HRLPPO",
"HrlRolloutStorage",
]
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

did you make sure these appear on the Pages website after you build the documentation locally ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's fine. We will deal with that later.

@akshararai akshararai merged commit 7c842d9 into main Feb 10, 2023
dannymcy pushed a commit to dannymcy/habitat-lab that referenced this pull request Jul 8, 2024
…h#1053)

* Trainable HL policy

* Working on HRL trainer

* Fixed config setup

* Train hl modif (facebookresearch#1057)

* separate config policy

* udpate config

* renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav

* renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav

* order of parameters in config to avoid overwite

* added documentation on how to run the HRL policy, removed auto_name, not needed in this config

* remove empty file

* fixed the yaml linked in the readme

* updated the description of the config fike

* Delete tp_noop_onav.yaml

don't need this file anymore, added readme instructions to do this by changing hydra parameters.

* setting the agent back to depth_head_agent as _vis slows down the training. Apologies

* removed one config file, since it seemed like mostly a repeat of the tp config

* cleaned up config files by removing reduntant parameters

* renamed monolithic_pointnav to be monolithic

* fixed the comment

* added localization_sensor to all task lab_sensors

Co-authored-by: Xavier Puig <xavierpuig@fb.com>

* Update README.md

Fixed link for PDDL

* Match tensor device when checking if the skills is done

* Train hl modif2 (facebookresearch#1076)

* make should_terminate consistently on cpu

* Corrected config files for skills using noop + bug in pddl_apply_action

* bug fixes in apply postcond

* Update hierarchical_policy.py

* Fixed RNN problem

* Fixed tests

* Fixed formatting

* Fixed device issues. Cleaned up configs.

* More config cleanup

* Addressing PR comments

* Updated circular reference

* Addressing PR comments

* Addressing PR comments

* Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

* Addressing PR comments

* Resolved storage problem

* Update oracle_nav.py

* Fix for agent rotation

* Missing key

* More docs

* Update habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

* Update habitat-baselines/habitat_baselines/rl/hrl/utils.py

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

* Updated name

* fixes for training

* Fixed env issue

* Fixed deprecated configs

* Speed fix

* Updated configs

* Pddl action fixes

* Removed speed opts. Fixed some bugs

* Fixed rendering text to the frame

* Addressing Vince's PR comments

* Refactored navigation to be much clearer

* Fixed some of the tests

* Adddressed PR comments

* Fixed rotation issue

* Fixed black

* Addressed PR comments

* Addressed PR comments

* Fixed config

* Fixed typo

* Fixed another typo

* CI

* Updated to work with older pytorch version

* renaming --exp-config to --config-name again

---------

Co-authored-by: akshararai <akshara.rai@gmail.com>
Co-authored-by: Xavier Puig <xavierpuig@fb.com>
Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>
Co-authored-by: vincentpierre <vincentpierre@users.noreply.github.com>
HHYHRHY pushed a commit to SgtVincent/EMOS that referenced this pull request Aug 31, 2024
…h#1053)

* Trainable HL policy

* Working on HRL trainer

* Fixed config setup

* Train hl modif (facebookresearch#1057)

* separate config policy

* udpate config

* renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav

* renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav

* order of parameters in config to avoid overwite

* added documentation on how to run the HRL policy, removed auto_name, not needed in this config

* remove empty file

* fixed the yaml linked in the readme

* updated the description of the config fike

* Delete tp_noop_onav.yaml

don't need this file anymore, added readme instructions to do this by changing hydra parameters.

* setting the agent back to depth_head_agent as _vis slows down the training. Apologies

* removed one config file, since it seemed like mostly a repeat of the tp config

* cleaned up config files by removing reduntant parameters

* renamed monolithic_pointnav to be monolithic

* fixed the comment

* added localization_sensor to all task lab_sensors

Co-authored-by: Xavier Puig <xavierpuig@fb.com>

* Update README.md

Fixed link for PDDL

* Match tensor device when checking if the skills is done

* Train hl modif2 (facebookresearch#1076)

* make should_terminate consistently on cpu

* Corrected config files for skills using noop + bug in pddl_apply_action

* bug fixes in apply postcond

* Update hierarchical_policy.py

* Fixed RNN problem

* Fixed tests

* Fixed formatting

* Fixed device issues. Cleaned up configs.

* More config cleanup

* Addressing PR comments

* Updated circular reference

* Addressing PR comments

* Addressing PR comments

* Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

* Addressing PR comments

* Resolved storage problem

* Update oracle_nav.py

* Fix for agent rotation

* Missing key

* More docs

* Update habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

* Update habitat-baselines/habitat_baselines/rl/hrl/utils.py

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

* Updated name

* fixes for training

* Fixed env issue

* Fixed deprecated configs

* Speed fix

* Updated configs

* Pddl action fixes

* Removed speed opts. Fixed some bugs

* Fixed rendering text to the frame

* Addressing Vince's PR comments

* Refactored navigation to be much clearer

* Fixed some of the tests

* Adddressed PR comments

* Fixed rotation issue

* Fixed black

* Addressed PR comments

* Addressed PR comments

* Fixed config

* Fixed typo

* Fixed another typo

* CI

* Updated to work with older pytorch version

* renaming --exp-config to --config-name again

---------

Co-authored-by: akshararai <akshara.rai@gmail.com>
Co-authored-by: Xavier Puig <xavierpuig@fb.com>
Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>
Co-authored-by: vincentpierre <vincentpierre@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
CLA Signed Do not delete this pull request or issue due to inactivity.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants