-
Notifications
You must be signed in to change notification settings - Fork 505
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Train High-Level Policies in Hierarchical Approaches #1053
Conversation
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/tp_srl.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/tp_srl_oracle_nav.yaml
Outdated
Show resolved
Hide resolved
@@ -136,5 +136,21 @@ def register_auxiliary_loss( | |||
def get_auxiliary_loss(cls, name: str): | |||
return cls._get_impl("aux_loss", name) | |||
|
|||
@classmethod |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
storage and updater are not very descriptive. Can you add a little text here to clarify what is the base classes that are being registered?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe even add a
assert isinstance(to_register, RolloutStorage)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Where is the assert ?
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Show resolved
Hide resolved
* separate config policy * udpate config * renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav * renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav * order of parameters in config to avoid overwite * added documentation on how to run the HRL policy, removed auto_name, not needed in this config * remove empty file * fixed the yaml linked in the readme * updated the description of the config fike * Delete tp_noop_onav.yaml don't need this file anymore, added readme instructions to do this by changing hydra parameters. * setting the agent back to depth_head_agent as _vis slows down the training. Apologies * removed one config file, since it seemed like mostly a repeat of the tp config * cleaned up config files by removing reduntant parameters * renamed monolithic_pointnav to be monolithic * fixed the comment * added localization_sensor to all task lab_sensors Co-authored-by: Xavier Puig <xavierpuig@fb.com>
Fixed link for PDDL
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/tp_srl_oracle_nav.yaml
Outdated
Show resolved
Hide resolved
* make should_terminate consistently on cpu * Corrected config files for skills using noop + bug in pddl_apply_action * bug fixes in apply postcond * Update hierarchical_policy.py
habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical.yaml
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/habitat_baselines/rl/policy/hl_neural.yaml
Outdated
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/habitat_baselines/rl/policy/hl_neural.yaml
Show resolved
Hide resolved
elif self._skills[skill_id].num_recurrent_layers != 0: | ||
raise ValueError( | ||
f"The code does not currently support neural LL and neural HL skills. Skill={self._skills[skill_id]}, HL={self._high_level_policy}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hard to understand why the skill having non zero number of recurrent layers means that the user is trying to use neural HL and neural LL. Add a comment of give an property to skills with a descriptive name that checks if the LL is neural.
Also, aren't we trying to have frozen NN LL skills and training NN HL policy ? This error seems to indicate this is not supported.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct, that combination is currently not supported. I refactored with a more descriptive property.
Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove : habitat-baselines/tb/events.out.tfevents.1674193555.s-164-67-210-79.resnet.ucla.edu.58058.0
It seems running
Does not work on this branch, but does on main. |
obs_skill_inputs: ["obj_start_sensor"] | ||
max_skill_steps: 1 | ||
apply_postconds: True | ||
force_end_on_timeout: False |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshararai check if all these parameters are needed, and add a sample documentation for them somewhere.
name: "NeuralHighLevelPolicy" | ||
allow_other_place: False | ||
hidden_dim: 512 | ||
use_rnn: True |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshararai check which of these fields are needed
EPS_PPO = 1e-5 | ||
|
||
|
||
@baseline_registry.register_storage |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hm I prefer Vince's style better, as in theory, it allows for less nested inheritance. But I think this can be part of a future Quality-of-life PR.
Imagine a new type of rollout storage NewRolloutStorage. Should that inherit from HrlRolloutStorage? Then the nested inheritance becomes newRolloutStorage -> HrlRolloutStorage -> RolloutStorage. Instead if we have a BaseRolloutStorage, then all the classes can inherit from the base, avoiding this multi-layer nested inheritance.
...nes/config/habitat_baselines/rl/policy/hierarchical_policy/defined_skills/oracle_skills.yaml
Show resolved
Hide resolved
habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical.yaml
Show resolved
Hide resolved
trainer_name: "ddppo" | ||
torch_gpu_id: 0 | ||
video_fps: 30 | ||
eval_ckpt_path_dir: "" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@akshararai check for redundant variables
|
||
defaults: | ||
- rl_hierarchical | ||
- /habitat/task/actions: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the oracle navigation action, and why is it different from the skill?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The oracle navigation action is a Habitat-Lab action that accesses the underlying simulator state to compute the oracle path. The oracle navigation skill does not have access to the simulator because it is on the Habitat-Baselines side.
@@ -191,6 +196,8 @@ def get_next_skill( | |||
if should_plan != 1.0: | |||
continue | |||
use_ac = self._all_actions[skill_sel[batch_idx]] | |||
if baselines_logger.level >= logging.DEBUG: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you need this? I thought .debug already checks if the level is correct.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, this is needed because I want to avoid any cost of formatting the use_ac
value.
trainer_name: str = "ppo" | ||
updater_name: str = "PPO" | ||
distrib_updater_name: str = "DDPPO" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks super redundant.
Some of these are capitalized.
Neither distrib_updater_name nor updater_name are ever changed in any of the configurations.
Can distrib_updater_name and updater_name be properties of the trainer ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, I think they are best here because they are needed to instantiate the updater. But this was a mistake, rl_hierarchical
was supposed to change these properties.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can't these just be inferred from other configurations ?
habitat-baselines/habitat_baselines/config/default_structured_configs.py
Outdated
Show resolved
Hide resolved
__all__ = [ | ||
"HRLPPO", | ||
"HrlRolloutStorage", | ||
] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
did you make sure these appear on the Pages website after you build the documentation locally ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fine. We will deal with that later.
habitat-baselines/habitat_baselines/rl/hrl/hl/high_level_policy.py
Outdated
Show resolved
Hide resolved
…h#1053) * Trainable HL policy * Working on HRL trainer * Fixed config setup * Train hl modif (facebookresearch#1057) * separate config policy * udpate config * renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav * renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav * order of parameters in config to avoid overwite * added documentation on how to run the HRL policy, removed auto_name, not needed in this config * remove empty file * fixed the yaml linked in the readme * updated the description of the config fike * Delete tp_noop_onav.yaml don't need this file anymore, added readme instructions to do this by changing hydra parameters. * setting the agent back to depth_head_agent as _vis slows down the training. Apologies * removed one config file, since it seemed like mostly a repeat of the tp config * cleaned up config files by removing reduntant parameters * renamed monolithic_pointnav to be monolithic * fixed the comment * added localization_sensor to all task lab_sensors Co-authored-by: Xavier Puig <xavierpuig@fb.com> * Update README.md Fixed link for PDDL * Match tensor device when checking if the skills is done * Train hl modif2 (facebookresearch#1076) * make should_terminate consistently on cpu * Corrected config files for skills using noop + bug in pddl_apply_action * bug fixes in apply postcond * Update hierarchical_policy.py * Fixed RNN problem * Fixed tests * Fixed formatting * Fixed device issues. Cleaned up configs. * More config cleanup * Addressing PR comments * Updated circular reference * Addressing PR comments * Addressing PR comments * Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Addressing PR comments * Resolved storage problem * Update oracle_nav.py * Fix for agent rotation * Missing key * More docs * Update habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Update habitat-baselines/habitat_baselines/rl/hrl/utils.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Updated name * fixes for training * Fixed env issue * Fixed deprecated configs * Speed fix * Updated configs * Pddl action fixes * Removed speed opts. Fixed some bugs * Fixed rendering text to the frame * Addressing Vince's PR comments * Refactored navigation to be much clearer * Fixed some of the tests * Adddressed PR comments * Fixed rotation issue * Fixed black * Addressed PR comments * Addressed PR comments * Fixed config * Fixed typo * Fixed another typo * CI * Updated to work with older pytorch version * renaming --exp-config to --config-name again --------- Co-authored-by: akshararai <akshara.rai@gmail.com> Co-authored-by: Xavier Puig <xavierpuig@fb.com> Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> Co-authored-by: vincentpierre <vincentpierre@users.noreply.github.com>
…h#1053) * Trainable HL policy * Working on HRL trainer * Fixed config setup * Train hl modif (facebookresearch#1057) * separate config policy * udpate config * renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav * renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav * order of parameters in config to avoid overwite * added documentation on how to run the HRL policy, removed auto_name, not needed in this config * remove empty file * fixed the yaml linked in the readme * updated the description of the config fike * Delete tp_noop_onav.yaml don't need this file anymore, added readme instructions to do this by changing hydra parameters. * setting the agent back to depth_head_agent as _vis slows down the training. Apologies * removed one config file, since it seemed like mostly a repeat of the tp config * cleaned up config files by removing reduntant parameters * renamed monolithic_pointnav to be monolithic * fixed the comment * added localization_sensor to all task lab_sensors Co-authored-by: Xavier Puig <xavierpuig@fb.com> * Update README.md Fixed link for PDDL * Match tensor device when checking if the skills is done * Train hl modif2 (facebookresearch#1076) * make should_terminate consistently on cpu * Corrected config files for skills using noop + bug in pddl_apply_action * bug fixes in apply postcond * Update hierarchical_policy.py * Fixed RNN problem * Fixed tests * Fixed formatting * Fixed device issues. Cleaned up configs. * More config cleanup * Addressing PR comments * Updated circular reference * Addressing PR comments * Addressing PR comments * Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Addressing PR comments * Resolved storage problem * Update oracle_nav.py * Fix for agent rotation * Missing key * More docs * Update habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Update habitat-baselines/habitat_baselines/rl/hrl/utils.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Updated name * fixes for training * Fixed env issue * Fixed deprecated configs * Speed fix * Updated configs * Pddl action fixes * Removed speed opts. Fixed some bugs * Fixed rendering text to the frame * Addressing Vince's PR comments * Refactored navigation to be much clearer * Fixed some of the tests * Adddressed PR comments * Fixed rotation issue * Fixed black * Addressed PR comments * Addressed PR comments * Fixed config * Fixed typo * Fixed another typo * CI * Updated to work with older pytorch version * renaming --exp-config to --config-name again --------- Co-authored-by: akshararai <akshara.rai@gmail.com> Co-authored-by: Xavier Puig <xavierpuig@fb.com> Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> Co-authored-by: vincentpierre <vincentpierre@users.noreply.github.com>
Motivation and Context
How Has This Been Tested
New HRL tests.
To run fixed HL and oracle LL skills in evaluation mode:
python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml --run-type eval habitat_baselines/rl/policy=hl_fixed habitat_baselines/rl/policy/hierarchical_policy/defined_skills=oracle_skills
To run learned HL and oracle LL skills in train mode:
python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml --run-type train habitat_baselines/rl/policy=hl_neural habitat_baselines/rl/policy/hierarchical_policy/defined_skills=oracle_skills
With oracle skills, you should run in kinematic simulation mode. See the note at the top of
habitat_baselines/config/habitat_baselines/rl/policy/hierarchical_policy/defined_skills/oracle_skills.yaml
for how to add kinematic mode.To run only over the minival dataset. Add
habitat_baselines.eval.split=minival habitat.dataset.split=minival
.Types of changes
Checklist