Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Train High-Level Policies in Hierarchical Approaches #1053

Merged
merged 55 commits into from
Feb 10, 2023
Merged
Show file tree
Hide file tree
Changes from 4 commits
Commits
Show all changes
55 commits
Select commit Hold shift + click to select a range
a787e34
Trainable HL policy
ASzot Dec 24, 2022
f6d1f11
Working on HRL trainer
ASzot Dec 26, 2022
daa84db
Fixed config setup
ASzot Dec 27, 2022
7ae2c6b
Train hl modif (#1057)
akshararai Jan 6, 2023
bb2e4da
Update README.md
xavierpuigf Jan 9, 2023
628063d
Match tensor device when checking if the skills is done
xavierpuigf Jan 9, 2023
d179ecf
Train hl modif2 (#1076)
xavierpuigf Jan 13, 2023
515b3cf
Merged with main
ASzot Jan 19, 2023
8eddcd1
Fixed RNN problem
ASzot Jan 20, 2023
fa94bfc
Fixed tests
ASzot Jan 20, 2023
11659b4
Fixed formatting
ASzot Jan 20, 2023
633ebf3
Fixed device issues. Cleaned up configs.
ASzot Jan 27, 2023
1c82390
More config cleanup
ASzot Jan 27, 2023
2e973f5
Addressing PR comments
ASzot Jan 27, 2023
3a805ea
Updated circular reference
ASzot Jan 27, 2023
7d92072
Addressing PR comments
ASzot Jan 27, 2023
4696551
Addressing PR comments
ASzot Jan 27, 2023
c11601f
Update habitat-baselines/habitat_baselines/rl/hrl/skills/skill.py
ASzot Jan 27, 2023
90155e1
Addressing PR comments
ASzot Jan 27, 2023
755acb4
Resolved storage problem
ASzot Jan 28, 2023
8d07335
merged
ASzot Jan 28, 2023
4754fd7
Update oracle_nav.py
xavierpuigf Jan 30, 2023
587672e
Fix for agent rotation
ASzot Jan 30, 2023
f16bc14
Missing key
ASzot Jan 30, 2023
22cb83f
More docs
ASzot Jan 31, 2023
df8b7a6
Update habitat-baselines/habitat_baselines/rl/hrl/hrl_rollout_storage.py
ASzot Jan 31, 2023
422c036
Update habitat-baselines/habitat_baselines/rl/hrl/utils.py
ASzot Jan 31, 2023
5643b0e
Updated name
ASzot Jan 31, 2023
ebe877e
fixes for training
ASzot Feb 1, 2023
e1a8727
Fixed env issue
ASzot Feb 2, 2023
2673f4f
Fixed deprecated configs
ASzot Feb 2, 2023
03faec7
Merge branch 'main' into train_hl
ASzot Feb 2, 2023
dd01d50
Speed fix
ASzot Feb 3, 2023
b20fcb1
Updated configs
ASzot Feb 3, 2023
7982f40
Pddl action fixes
ASzot Feb 3, 2023
902bfa1
Removed speed opts. Fixed some bugs
ASzot Feb 4, 2023
74278bd
Fixed rendering text to the frame
ASzot Feb 4, 2023
63f610c
Merged with main
ASzot Feb 4, 2023
49a71a4
Addressing Vince's PR comments
ASzot Feb 4, 2023
b41133a
Refactored navigation to be much clearer
ASzot Feb 4, 2023
e7a877b
Fixed some of the tests
ASzot Feb 5, 2023
5c213e3
Adddressed PR comments
ASzot Feb 6, 2023
d9721f1
Fixed rotation issue
ASzot Feb 6, 2023
f8387de
Fixed black
ASzot Feb 6, 2023
1c8f54c
Addressed PR comments
ASzot Feb 8, 2023
f2c6731
Addressed PR comments
ASzot Feb 8, 2023
4b65b3f
Merge branch 'main' into train_hl
ASzot Feb 8, 2023
11e77c3
Fixed config
ASzot Feb 8, 2023
6f5ea76
Fixed typo
ASzot Feb 8, 2023
cb6ce62
Fixed another typo
ASzot Feb 8, 2023
6d4b968
CI
ASzot Feb 8, 2023
d6c957e
Merge branch 'main' into train_hl
vincentpierre Feb 8, 2023
9d2c2f5
Updated to work with older pytorch version
ASzot Feb 9, 2023
a17fbfa
Merge branch 'main' into train_hl
ASzot Feb 9, 2023
48142fb
renaming --exp-config to --config-name again
vincentpierre Feb 9, 2023
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion habitat-baselines/habitat_baselines/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -57,7 +57,7 @@ Change the `/benchmark/nav/pointnav: pointnav_gibson` in `habitat_baselines/conf

We provide a two-layer hierarchical policy class, consisting of a low-level skill that moves the robot, and a high-level policy that reasons about which low-level skill to use in the current state. This can be especially powerful in long-horizon mobile manipulation tasks, like those introduced in [Habitat2.0](https://arxiv.org/abs/2106.14405). Both the low- and high- level can be either learned or an oracle. For oracle high-level we use [PDDL](https://planning.wiki/guide/whatis/pddl), and for oracle low-level we use instantaneous transitions, with the environment set to the final desired state. Additionally, for navigation, we provide an oracle navigation skill that uses A-star and the map of the environment to move the robot to its goal.

To run the following examples, you need the [ReplicaCAD dataset](https://github.com/facebookresearch/habitat-sim/blob/main/DATASETS.md#replicacad).
To run the following examples, you need the [ReplicaCAD dataset](https://github.com/facebookresearch/habitat-sim/blob/main/DATASETS.md#replicacad).

To train a high-level policy, while using pre-learned low-level skills (SRL baseline from [Habitat2.0](https://arxiv.org/abs/2106.14405)), you can run:

Expand Down
22 changes: 19 additions & 3 deletions habitat-baselines/habitat_baselines/common/rollout_storage.py
Original file line number Diff line number Diff line change
Expand Up @@ -5,7 +5,7 @@
# LICENSE file in the root directory of this source tree.

import warnings
from typing import Any, Dict, Iterator, Optional, Tuple
from typing import Any, Dict, Iterator, Optional

import numpy as np
import torch
Expand All @@ -16,6 +16,10 @@
build_pack_info_from_dones,
build_rnn_build_seq_info,
)
from habitat_baselines.utils.common import (
get_num_actions,
is_continuous_action_space,
)


@baseline_registry.register_storage
Expand All @@ -30,10 +34,22 @@ def __init__(
action_space,
recurrent_hidden_state_size,
num_recurrent_layers=1,
action_shape: Optional[Tuple[int]] = None,
is_double_buffered: bool = False,
discrete_actions: bool = True,
):

if is_continuous_action_space(action_space):
# Assume ALL actions are NOT discrete
action_shape = (
get_num_actions(
action_space,
),
)
discrete_actions = False
else:
# For discrete pointnav
action_shape = (1,)
discrete_actions = True
ASzot marked this conversation as resolved.
Show resolved Hide resolved

self.buffers = TensorDict()
self.buffers["observations"] = TensorDict()

Expand Down
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
open_cab:
skill_name: "ArtObjSkillPolicy"
load_ckpt_file: "data/models/open_cab.pth"

open_fridge:
skill_name: "ArtObjSkillPolicy"
load_ckpt_file: "data/models/open_fridge.pth"

close_cab:
skill_name: "ArtObjSkillPolicy"
load_ckpt_file: "data/models/close_cab.pth"

close_fridge:
skill_name: "ArtObjSkillPolicy"
load_ckpt_file: "data/models/close_fridge.pth"

pick:
skill_name: "PickSkillPolicy"
obs_skill_inputs: ["obj_start_sensor"]
load_ckpt_file: "data/models/pick.pth"

place:
skill_name: "PlaceSkillPolicy"
obs_skill_inputs: ["obj_goal_sensor"]
load_ckpt_file: "data/models/place.pth"

wait_skill:
skill_name: "WaitSkillPolicy"
max_skill_steps: -1
force_end_on_timeout: False

nav_to_obj:
skill_name: "NavSkillPolicy"
obs_skill_inputs: ["goal_to_agent_gps_compass"]
load_ckpt_file: "data/models/nav.pth"
max_skill_steps: 300
obs_skill_input_dim: 2

reset_arm_skill:
skill_name: "ResetArmSkill"
max_skill_steps: 50
reset_joint_state: [-4.50e-01, -1.08e00, 9.95e-02, 9.38e-01, -7.88e-04, 1.57e00, 4.62e-03]
force_end_on_timeout: False
Original file line number Diff line number Diff line change
@@ -0,0 +1,52 @@
open_cab:
skill_name: "NoopSkillPolicy"
max_skill_steps: 1
apply_postconds: True

open_fridge:
skill_name: "NoopSkillPolicy"
max_skill_steps: 1
apply_postconds: True

close_cab:
skill_name: "NoopSkillPolicy"
obs_skill_inputs: ["obj_start_sensor"]
max_skill_steps: 1

close_fridge:
skill_name: "NoopSkillPolicy"
obs_skill_inputs: ["obj_start_sensor"]
max_skill_steps: 1
apply_postconds: True

pick:
skill_name: "NoopSkillPolicy"
obs_skill_inputs: ["obj_start_sensor"]
max_skill_steps: 1
apply_postconds: True
force_end_on_timeout: False

place:
skill_name: "NoopSkillPolicy"
obs_skill_inputs: ["obj_goal_sensor"]
max_skill_steps: 1
apply_postconds: True
force_end_on_timeout: False

wait_skill:
skill_name: "WaitSkillPolicy"
max_skill_steps: -1

nav_to_obj:
skill_name: "NoopSkillPolicy"
obs_skill_inputs: ["goal_to_agent_gps_compass"]
max_skill_steps: 1
apply_postconds: True
force_end_on_timeout: False
obs_skill_input_dim: 2

reset_arm_skill:
skill_name: "ResetArmSkill"
max_skill_steps: 50
reset_joint_state: [-4.50e-01, -1.07e00, 9.95e-02, 9.38e-01, -7.88e-04, 1.57e00, 4.62e-03]
force_end_on_timeout: False

This file was deleted.

This file was deleted.

This file was deleted.

Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
name: "HierarchicalPolicy"
obs_transforms:
add_virtual_keys:
virtual_keys:
"goal_to_agent_gps_compass": 2
hierarchical_policy:
high_level_policy:
name: "FixedHighLevelPolicy"
add_arm_rest: True
use_skills:
open_cab: "open_cab"
open_fridge: "open_fridge"
close_cab: "close_cab"
close_fridge: "close_fridge"
pick: "pick"
place: "place"
nav: "nav_to_obj"
nav_to_receptacle: "nav_to_obj"
wait: "wait_skill"
reset_arm: "reset_arm_skill"
defined_skills: {}
Loading