Train High-Level Policies in Hierarchical Approaches #1053

ASzot · 2022-12-28T05:40:33Z

Motivation and Context

Adds trainable HL policies.
Refactors the tests for HRL related approaches.
Registry for storage and updaters.
Added noop skill.

How Has This Been Tested

New HRL tests.

To run fixed HL and oracle LL skills in evaluation mode: python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml --run-type eval habitat_baselines/rl/policy=hl_fixed habitat_baselines/rl/policy/hierarchical_policy/defined_skills=oracle_skills
To run learned HL and oracle LL skills in train mode: python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml --run-type train habitat_baselines/rl/policy=hl_neural habitat_baselines/rl/policy/hierarchical_policy/defined_skills=oracle_skills

With oracle skills, you should run in kinematic simulation mode. See the note at the top of habitat_baselines/config/habitat_baselines/rl/policy/hierarchical_policy/defined_skills/oracle_skills.yaml for how to add kinematic mode.

To run only over the minival dataset. Add habitat_baselines.eval.split=minival habitat.dataset.split=minival.

Types of changes

Docs change / refactoring / dependency upgrade
Bug fix (non-breaking change which fixes an issue)
New feature (non-breaking change which adds functionality)
Breaking change (fix or feature that would cause existing functionality to change)

Checklist

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.
I have read the CONTRIBUTING document.
I have completed my CLA (see CONTRIBUTING)
I have added tests to cover my changes.
All new and existing tests passed.

habitat-baselines/habitat_baselines/config/default_structured_configs.py

habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml

habitat-baselines/habitat_baselines/rl/hrl/hierarchical_policy.py

habitat-baselines/habitat_baselines/rl/hrl/hl/fixed_policy.py

habitat-lab/habitat/tasks/rearrange/actions/pddl_actions.py

habitat-baselines/habitat_baselines/config/rearrange/tp_srl.yaml

habitat-baselines/habitat_baselines/config/rearrange/tp_srl_oracle_nav.yaml

habitat-baselines/habitat_baselines/rl/hrl/hierarchical_policy.py

habitat-baselines/habitat_baselines/rl/hrl/hl/fixed_policy.py

habitat-lab/habitat/config/default_structured_configs.py

habitat-lab/habitat/tasks/rearrange/actions/pddl_actions.py

habitat-baselines/habitat_baselines/rl/ver/inference_worker.py

vincentpierre · 2023-01-04T19:18:56Z

habitat-baselines/habitat_baselines/common/baseline_registry.py

@@ -136,5 +136,21 @@ def register_auxiliary_loss(
    def get_auxiliary_loss(cls, name: str):
        return cls._get_impl("aux_loss", name)

+    @classmethod


storage and updater are not very descriptive. Can you add a little text here to clarify what is the base classes that are being registered?

Maybe even add a

assert isinstance(to_register, RolloutStorage)

Where is the assert ?

habitat-baselines/habitat_baselines/config/default_structured_configs.py

habitat-baselines/habitat_baselines/config/rearrange/rl_hl_srl_onav.yaml

habitat-baselines/habitat_baselines/config/default_structured_configs.py

habitat-baselines/habitat_baselines/rl/ppo/ppo.py

habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py

habitat-lab/habitat/tasks/rearrange/actions/pddl_actions.py

test/test_baseline_training.py

habitat-baselines/habitat_baselines/config/default_structured_configs.py

* separate config policy * udpate config * renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav * renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav * order of parameters in config to avoid overwite * added documentation on how to run the HRL policy, removed auto_name, not needed in this config * remove empty file * fixed the yaml linked in the readme * updated the description of the config fike * Delete tp_noop_onav.yaml don't need this file anymore, added readme instructions to do this by changing hydra parameters. * setting the agent back to depth_head_agent as _vis slows down the training. Apologies * removed one config file, since it seemed like mostly a repeat of the tp config * cleaned up config files by removing reduntant parameters * renamed monolithic_pointnav to be monolithic * fixed the comment * added localization_sensor to all task lab_sensors Co-authored-by: Xavier Puig <xavierpuig@fb.com>

Fixed link for PDDL

habitat-baselines/habitat_baselines/config/default_structured_configs.py

habitat-baselines/habitat_baselines/config/rearrange/tp_srl_oracle_nav.yaml

* make should_terminate consistently on cpu * Corrected config files for skills using noop + bug in pddl_apply_action * bug fixes in apply postcond * Update hierarchical_policy.py

habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical.yaml

habitat-baselines/habitat_baselines/config/habitat_baselines/rl/policy/hl_neural.yaml

habitat-baselines/habitat_baselines/rl/hrl/hierarchical_policy.py

vincentpierre · 2023-01-23T19:44:18Z

habitat-baselines/habitat_baselines/rl/hrl/hierarchical_policy.py

+                elif self._skills[skill_id].num_recurrent_layers != 0:
+                    raise ValueError(
+                        f"The code does not currently support neural LL and neural HL skills. Skill={self._skills[skill_id]}, HL={self._high_level_policy}"


Hard to understand why the skill having non zero number of recurrent layers means that the user is trying to use neural HL and neural LL. Add a comment of give an property to skills with a descriptive name that checks if the LL is neural.
Also, aren't we trying to have frozen NN LL skills and training NN HL policy ? This error seems to indicate this is not supported.

Correct, that combination is currently not supported. I refactored with a more descriptive property.

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

vincentpierre

Remove : habitat-baselines/tb/events.out.tfevents.1674193555.s-164-67-210-79.resnet.ucla.edu.58058.0

habitat-baselines/habitat_baselines/README.md

vincentpierre · 2023-02-04T00:28:14Z

It seems running

 python habitat-baselines/habitat_baselines/run.py --exp-config habitat-baselines/habitat_baselines/config/rearrange/rl_rearrange.yaml --run-type train habitat_baselines.num_environments=4

Does not work on this branch, but does on main.

akshararai · 2023-02-02T04:30:26Z

...nes/config/habitat_baselines/rl/policy/hierarchical_policy/defined_skills/oracle_skills.yaml

+  obs_skill_inputs: ["obj_start_sensor"]
+  max_skill_steps: 1
+  apply_postconds: True
+  force_end_on_timeout: False


@akshararai check if all these parameters are needed, and add a sample documentation for them somewhere.

akshararai · 2023-02-02T04:31:18Z

habitat-baselines/habitat_baselines/config/habitat_baselines/rl/policy/hl_neural.yaml

+    name: "NeuralHighLevelPolicy"
+    allow_other_place: False
+    hidden_dim: 512
+    use_rnn: True


@akshararai check which of these fields are needed

akshararai · 2023-02-02T04:39:32Z

habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py

+EPS_PPO = 1e-5
+
+
+@baseline_registry.register_storage


Hm I prefer Vince's style better, as in theory, it allows for less nested inheritance. But I think this can be part of a future Quality-of-life PR.
Imagine a new type of rollout storage NewRolloutStorage. Should that inherit from HrlRolloutStorage? Then the nested inheritance becomes newRolloutStorage -> HrlRolloutStorage -> RolloutStorage. Instead if we have a BaseRolloutStorage, then all the classes can inherit from the base, avoiding this multi-layer nested inheritance.

...nes/config/habitat_baselines/rl/policy/hierarchical_policy/defined_skills/oracle_skills.yaml

habitat-baselines/habitat_baselines/rl/ppo/ppo_trainer.py

habitat-baselines/habitat_baselines/rl/hrl/skills/reset.py

habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical.yaml

akshararai · 2023-02-06T00:37:17Z

habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical.yaml

+  trainer_name: "ddppo"
+  torch_gpu_id: 0
+  video_fps: 30
+  eval_ckpt_path_dir: ""


@akshararai check for redundant variables

akshararai · 2023-02-06T00:40:57Z

habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical_oracle_nav.yaml

+
+defaults:
+  - rl_hierarchical
+  - /habitat/task/actions:


What is the oracle navigation action, and why is it different from the skill?

The oracle navigation action is a Habitat-Lab action that accesses the underlying simulator state to compute the oracle path. The oracle navigation skill does not have access to the simulator because it is on the Habitat-Baselines side.

habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py

vincentpierre · 2023-02-06T19:49:13Z

habitat-baselines/habitat_baselines/rl/hrl/hl/neural_policy.py

@@ -191,6 +196,8 @@ def get_next_skill(
            if should_plan != 1.0:
                continue
            use_ac = self._all_actions[skill_sel[batch_idx]]
+            if baselines_logger.level >= logging.DEBUG:


Do you need this? I thought .debug already checks if the level is correct.

Yes, this is needed because I want to avoid any cost of formatting the use_ac value.

habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py

habitat-baselines/habitat_baselines/common/rollout_storage.py

vincentpierre · 2023-02-08T00:53:24Z

habitat-baselines/habitat_baselines/config/default_structured_configs.py

    trainer_name: str = "ppo"
+    updater_name: str = "PPO"
+    distrib_updater_name: str = "DDPPO"


This looks super redundant.
Some of these are capitalized.
Neither distrib_updater_name nor updater_name are ever changed in any of the configurations.
Can distrib_updater_name and updater_name be properties of the trainer ?

No, I think they are best here because they are needed to instantiate the updater. But this was a mistake, rl_hierarchical was supposed to change these properties.

Can't these just be inferred from other configurations ?

habitat-baselines/habitat_baselines/config/default_structured_configs.py

vincentpierre · 2023-02-08T00:57:11Z

habitat-baselines/habitat_baselines/rl/hrl/__init__.py

+__all__ = [
+    "HRLPPO",
+    "HrlRolloutStorage",
+]


did you make sure these appear on the Pages website after you build the documentation locally ?

It's fine. We will deal with that later.

habitat-baselines/habitat_baselines/rl/hrl/hl/high_level_policy.py

habitat-baselines/habitat_baselines/rl/ppo/policy.py

…h#1053) * Trainable HL policy * Working on HRL trainer * Fixed config setup * Train hl modif (facebookresearch#1057) * separate config policy * udpate config * renamed hierarchical_srl_learned_nav.yaml to hierarchical_srl.yaml since that already implies a learned nav * renamed yaml files to follow convention of _tp_ for task planner, _srl_ for learned skills, _onav_ for oracle nav * order of parameters in config to avoid overwite * added documentation on how to run the HRL policy, removed auto_name, not needed in this config * remove empty file * fixed the yaml linked in the readme * updated the description of the config fike * Delete tp_noop_onav.yaml don't need this file anymore, added readme instructions to do this by changing hydra parameters. * setting the agent back to depth_head_agent as _vis slows down the training. Apologies * removed one config file, since it seemed like mostly a repeat of the tp config * cleaned up config files by removing reduntant parameters * renamed monolithic_pointnav to be monolithic * fixed the comment * added localization_sensor to all task lab_sensors Co-authored-by: Xavier Puig <xavierpuig@fb.com> * Update README.md Fixed link for PDDL * Match tensor device when checking if the skills is done * Train hl modif2 (facebookresearch#1076) * make should_terminate consistently on cpu * Corrected config files for skills using noop + bug in pddl_apply_action * bug fixes in apply postcond * Update hierarchical_policy.py * Fixed RNN problem * Fixed tests * Fixed formatting * Fixed device issues. Cleaned up configs. * More config cleanup * Addressing PR comments * Updated circular reference * Addressing PR comments * Addressing PR comments * Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Addressing PR comments * Resolved storage problem * Update oracle_nav.py * Fix for agent rotation * Missing key * More docs * Update habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Update habitat-baselines/habitat_baselines/rl/hrl/utils.py Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> * Updated name * fixes for training * Fixed env issue * Fixed deprecated configs * Speed fix * Updated configs * Pddl action fixes * Removed speed opts. Fixed some bugs * Fixed rendering text to the frame * Addressing Vince's PR comments * Refactored navigation to be much clearer * Fixed some of the tests * Adddressed PR comments * Fixed rotation issue * Fixed black * Addressed PR comments * Addressed PR comments * Fixed config * Fixed typo * Fixed another typo * CI * Updated to work with older pytorch version * renaming --exp-config to --config-name again --------- Co-authored-by: akshararai <akshara.rai@gmail.com> Co-authored-by: Xavier Puig <xavierpuig@fb.com> Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com> Co-authored-by: vincentpierre <vincentpierre@users.noreply.github.com>

ASzot added 3 commits December 23, 2022 16:09

Trainable HL policy

a787e34

Working on HRL trainer

f6d1f11

Fixed config setup

daa84db

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Dec 28, 2022

ASzot requested review from vincentpierre and xavierpuigf December 28, 2022 05:40

akshararai reviewed Dec 28, 2022

View reviewed changes

xavierpuigf reviewed Jan 3, 2023

View reviewed changes

habitat-baselines/habitat_baselines/rl/ver/inference_worker.py Outdated Show resolved Hide resolved

vincentpierre reviewed Jan 4, 2023

View reviewed changes

vincentpierre reviewed Jan 5, 2023

View reviewed changes

habitat-baselines/habitat_baselines/config/default_structured_configs.py Show resolved Hide resolved

akshararai and others added 3 commits January 6, 2023 15:40

Update README.md

bb2e4da

Fixed link for PDDL

Match tensor device when checking if the skills is done

628063d

xavierpuigf reviewed Jan 9, 2023

View reviewed changes

habitat-baselines/habitat_baselines/config/default_structured_configs.py Outdated Show resolved Hide resolved

xavierpuigf reviewed Jan 11, 2023

View reviewed changes

habitat-baselines/habitat_baselines/config/rearrange/tp_srl_oracle_nav.yaml Outdated Show resolved Hide resolved

xavierpuigf and others added 5 commits January 13, 2023 09:03

Train hl modif2 (#1076)

d179ecf

* make should_terminate consistently on cpu * Corrected config files for skills using noop + bug in pddl_apply_action * bug fixes in apply postcond * Update hierarchical_policy.py

Merged with main

515b3cf

Fixed RNN problem

8eddcd1

Fixed tests

fa94bfc

Fixed formatting

11659b4

xavierpuigf reviewed Jan 21, 2023

View reviewed changes

habitat-baselines/habitat_baselines/config/rearrange/rl_hierarchical.yaml Show resolved Hide resolved

vincentpierre reviewed Jan 23, 2023

View reviewed changes

ASzot and others added 7 commits January 27, 2023 08:47

Fixed device issues. Cleaned up configs.

633ebf3

More config cleanup

1c82390

Addressing PR comments

2e973f5

Updated circular reference

3a805ea

Addressing PR comments

7d92072

Addressing PR comments

4696551

Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py

c11601f

Co-authored-by: Vincent-Pierre BERGES <28320361+vincentpierre@users.noreply.github.com>

vincentpierre reviewed Feb 3, 2023

View reviewed changes

habitat-baselines/habitat_baselines/README.md Outdated Show resolved Hide resolved

habitat-baselines/habitat_baselines/README.md Outdated Show resolved Hide resolved

Pddl action fixes

7982f40

ASzot added 6 commits February 3, 2023 20:14

Removed speed opts. Fixed some bugs

902bfa1

Fixed rendering text to the frame

74278bd

Merged with main

63f610c

Addressing Vince's PR comments

49a71a4

Refactored navigation to be much clearer

b41133a

Fixed some of the tests

e7a877b

akshararai reviewed Feb 6, 2023

View reviewed changes

ASzot added 3 commits February 5, 2023 21:41

Adddressed PR comments

5c213e3

Fixed rotation issue

d9721f1

Fixed black

f8387de

vincentpierre reviewed Feb 6, 2023

View reviewed changes

vincentpierre reviewed Feb 8, 2023

View reviewed changes

ASzot and others added 9 commits February 7, 2023 17:45

Addressed PR comments

1c8f54c

Addressed PR comments

f2c6731

Merge branch 'main' into train_hl

4b65b3f

Fixed config

11e77c3

Fixed typo

6f5ea76

Fixed another typo

cb6ce62

CI

6d4b968

Merge branch 'main' into train_hl

d6c957e

Updated to work with older pytorch version

9d2c2f5

akshararai approved these changes Feb 9, 2023

View reviewed changes

ASzot and others added 2 commits February 9, 2023 13:10

Merge branch 'main' into train_hl

a17fbfa

renaming --exp-config to --config-name again

48142fb

akshararai merged commit 7c842d9 into main Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Train High-Level Policies in Hierarchical Approaches #1053

Train High-Level Policies in Hierarchical Approaches #1053

ASzot commented Dec 28, 2022 •

edited

Loading

vincentpierre Jan 4, 2023

vincentpierre Jan 4, 2023

ASzot Jan 27, 2023

vincentpierre Feb 6, 2023

vincentpierre Jan 23, 2023

ASzot Jan 27, 2023

vincentpierre left a comment

vincentpierre commented Feb 4, 2023

akshararai Feb 2, 2023

akshararai Feb 2, 2023

akshararai Feb 2, 2023

akshararai Feb 6, 2023

akshararai Feb 6, 2023

ASzot Feb 6, 2023

vincentpierre Feb 6, 2023

ASzot Feb 8, 2023

vincentpierre Feb 8, 2023

ASzot Feb 8, 2023

vincentpierre Feb 8, 2023

vincentpierre Feb 8, 2023

vincentpierre Feb 8, 2023

		EPS_PPO = 1e-5


		@baseline_registry.register_storage

Train High-Level Policies in Hierarchical Approaches #1053

Train High-Level Policies in Hierarchical Approaches #1053

Conversation

ASzot commented Dec 28, 2022 • edited Loading

Motivation and Context

How Has This Been Tested

Types of changes

Checklist

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincentpierre left a comment

Choose a reason for hiding this comment

vincentpierre commented Feb 4, 2023

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ASzot commented Dec 28, 2022 •

edited

Loading