Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Simplify configs #550

Draft
wants to merge 33 commits into
base: main
Choose a base branch
from
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
33 commits
Select commit Hold shift + click to select a range
3d509aa
Add draccus, create MainConfig
aliberts Dec 5, 2024
82f197b
WIP refactor train.py and ACT
aliberts Dec 18, 2024
bed1ec3
Add policies training presets
aliberts Dec 23, 2024
0ab28eb
Update diffusion policy
aliberts Dec 23, 2024
a82e004
Add pusht and xarm env configs
aliberts Dec 23, 2024
d2ca27a
Update tdmpc
aliberts Dec 23, 2024
250e380
Update vqbet
aliberts Dec 23, 2024
d8ad763
Fix poetry relax
aliberts Dec 23, 2024
928a417
Add feature types to envs
aliberts Dec 27, 2024
b5f3287
Add EvalPipelineConfig, parse features from envs
aliberts Dec 27, 2024
72e84f2
Add custom parser
aliberts Jan 6, 2025
f6443d9
Update pretrained loading mechanisms
aliberts Jan 6, 2025
06b604b
Add dependency fixes & lock update
aliberts Jan 6, 2025
4a4ef9b
Fix pretrained_path
aliberts Jan 6, 2025
68463a3
Refactor envs, remove RealEnv
aliberts Jan 7, 2025
2bdf1d2
Fix typo
aliberts Jan 7, 2025
9c6edc2
Enable end-to-end tests
aliberts Jan 7, 2025
a29a1f1
Fix Makefile
aliberts Jan 7, 2025
d83a94c
Log eval config
aliberts Jan 8, 2025
26eef6e
Fix end-to-end tests
aliberts Jan 8, 2025
e2508f7
Merge remote-tracking branch 'origin/main' into user/aliberts/2024_11…
aliberts Jan 8, 2025
b799e02
Remove amp & add resume test
aliberts Jan 8, 2025
6c5667a
Speed-up tests
aliberts Jan 8, 2025
af96b04
Fix poetry relax
aliberts Jan 8, 2025
4261c5a
Remove config yaml for robot devices (#594)
Cadene Jan 9, 2025
6f62154
Merge remote-tracking branch 'origin/main' into user/aliberts/2024_11…
aliberts Jan 9, 2025
02b996a
Fix logger
aliberts Jan 9, 2025
a69b425
Remove hydra-core
aliberts Jan 9, 2025
5871fe8
Remove NoneSchedulerConfig
aliberts Jan 9, 2025
3c5e8a5
Add push_pretrained
aliberts Jan 9, 2025
1eb8527
Remove eval.episode_length
aliberts Jan 9, 2025
abaf654
Fix wandb_video
aliberts Jan 9, 2025
6bd9e12
Fix typo
aliberts Jan 10, 2025
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
71 changes: 35 additions & 36 deletions .github/workflows/test.yml
Original file line number Diff line number Diff line change
Expand Up @@ -101,39 +101,38 @@ jobs:
-W ignore::UserWarning:gymnasium.utils.env_checker:247 \
&& rm -rf tests/outputs outputs

# TODO(aliberts, rcadene): redesign after v2 migration / removing hydra
# end-to-end:
# name: End-to-end
# runs-on: ubuntu-latest
# env:
# MUJOCO_GL: egl
# steps:
# - uses: actions/checkout@v4
# with:
# lfs: true # Ensure LFS files are pulled

# - name: Install apt dependencies
# # portaudio19-dev is needed to install pyaudio
# run: |
# sudo apt-get update && \
# sudo apt-get install -y libegl1-mesa-dev portaudio19-dev

# - name: Install poetry
# run: |
# pipx install poetry && poetry config virtualenvs.in-project true
# echo "${{ github.workspace }}/.venv/bin" >> $GITHUB_PATH

# - name: Set up Python 3.10
# uses: actions/setup-python@v5
# with:
# python-version: "3.10"
# cache: "poetry"

# - name: Install poetry dependencies
# run: |
# poetry install --all-extras

# - name: Test end-to-end
# run: |
# make test-end-to-end \
# && rm -rf outputs
end-to-end:
name: End-to-end
runs-on: ubuntu-latest
env:
MUJOCO_GL: egl
steps:
- uses: actions/checkout@v4
with:
lfs: true # Ensure LFS files are pulled

- name: Install apt dependencies
# portaudio19-dev is needed to install pyaudio
run: |
sudo apt-get update && \
sudo apt-get install -y libegl1-mesa-dev portaudio19-dev

- name: Install poetry
run: |
pipx install poetry && poetry config virtualenvs.in-project true
echo "${{ github.workspace }}/.venv/bin" >> $GITHUB_PATH

- name: Set up Python 3.10
uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: "poetry"

- name: Install poetry dependencies
run: |
poetry install --all-extras

- name: Test end-to-end
run: |
make test-end-to-end \
&& rm -rf outputs
278 changes: 133 additions & 145 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -20,171 +20,159 @@ build-gpu:

test-end-to-end:
${MAKE} DEVICE=$(DEVICE) test-act-ete-train
${MAKE} DEVICE=$(DEVICE) test-act-ete-train-resume
${MAKE} DEVICE=$(DEVICE) test-act-ete-eval
${MAKE} DEVICE=$(DEVICE) test-act-ete-train-amp
${MAKE} DEVICE=$(DEVICE) test-act-ete-eval-amp
${MAKE} DEVICE=$(DEVICE) test-diffusion-ete-train
${MAKE} DEVICE=$(DEVICE) test-diffusion-ete-eval
${MAKE} DEVICE=$(DEVICE) test-tdmpc-ete-train
${MAKE} DEVICE=$(DEVICE) test-tdmpc-ete-train-with-online
${MAKE} DEVICE=$(DEVICE) test-tdmpc-ete-eval
${MAKE} DEVICE=$(DEVICE) test-default-ete-eval
${MAKE} DEVICE=$(DEVICE) test-act-pusht-tutorial

# ${MAKE} DEVICE=$(DEVICE) test-default-ete-eval
# ${MAKE} DEVICE=$(DEVICE) test-tdmpc-ete-train-with-online
# ${MAKE} DEVICE=$(DEVICE) test-act-pusht-tutorial

test-act-ete-train:
python lerobot/scripts/train.py \
policy=act \
policy.dim_model=64 \
env=aloha \
wandb.enable=False \
training.offline_steps=2 \
training.online_steps=0 \
eval.n_episodes=1 \
eval.batch_size=1 \
device=$(DEVICE) \
training.save_checkpoint=true \
training.save_freq=2 \
policy.n_action_steps=20 \
policy.chunk_size=20 \
training.batch_size=2 \
training.image_transforms.enable=true \
hydra.run.dir=tests/outputs/act/
--policy.type=act \
--policy.dim_model=64 \
--policy.n_action_steps=20 \
--policy.chunk_size=20 \
--env.type=aloha \
--env.episode_length=5 \
--dataset.repo_id=lerobot/aloha_sim_transfer_cube_human \
--dataset.image_transforms.enable=true \
--dataset.episodes="[0]" \
--batch_size=2 \
--offline.steps=4 \
--online.steps=0 \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--save_freq=2 \
--save_checkpoint=true \
--log_freq=1 \
--wandb.enable=false \
--device=$(DEVICE) \
--output_dir=tests/outputs/act/

test-act-ete-train-resume:
python lerobot/scripts/train.py \
--config_path=tests/outputs/act/checkpoints/000002/pretrained_model/train_config.json \
--resume=true

test-act-ete-eval:
python lerobot/scripts/eval.py \
-p tests/outputs/act/checkpoints/000002/pretrained_model \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=8 \
device=$(DEVICE) \

test-act-ete-train-amp:
python lerobot/scripts/train.py \
policy=act \
policy.dim_model=64 \
env=aloha \
wandb.enable=False \
training.offline_steps=2 \
training.online_steps=0 \
eval.n_episodes=1 \
eval.batch_size=1 \
device=$(DEVICE) \
training.save_checkpoint=true \
training.save_freq=2 \
policy.n_action_steps=20 \
policy.chunk_size=20 \
training.batch_size=2 \
hydra.run.dir=tests/outputs/act_amp/ \
training.image_transforms.enable=true \
use_amp=true

test-act-ete-eval-amp:
python lerobot/scripts/eval.py \
-p tests/outputs/act_amp/checkpoints/000002/pretrained_model \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=8 \
device=$(DEVICE) \
use_amp=true
--policy.path=tests/outputs/act/checkpoints/000004/pretrained_model \
--env.type=aloha \
--env.episode_length=5 \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--device=$(DEVICE)

test-diffusion-ete-train:
python lerobot/scripts/train.py \
policy=diffusion \
policy.down_dims=\[64,128,256\] \
policy.diffusion_step_embed_dim=32 \
policy.num_inference_steps=10 \
env=pusht \
wandb.enable=False \
training.offline_steps=2 \
training.online_steps=0 \
eval.n_episodes=1 \
eval.batch_size=1 \
device=$(DEVICE) \
training.save_checkpoint=true \
training.save_freq=2 \
training.batch_size=2 \
training.image_transforms.enable=true \
hydra.run.dir=tests/outputs/diffusion/
--policy.type=diffusion \
--policy.down_dims='[64,128,256]' \
--policy.diffusion_step_embed_dim=32 \
--policy.num_inference_steps=10 \
--env.type=pusht \
--env.episode_length=5 \
--dataset.repo_id=lerobot/pusht \
--dataset.image_transforms.enable=true \
--dataset.episodes="[0]" \
--batch_size=2 \
--offline.steps=2 \
--online.steps=0 \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--save_checkpoint=true \
--save_freq=2 \
--log_freq=1 \
--wandb.enable=false \
--device=$(DEVICE) \
--output_dir=tests/outputs/diffusion/

test-diffusion-ete-eval:
python lerobot/scripts/eval.py \
-p tests/outputs/diffusion/checkpoints/000002/pretrained_model \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=8 \
device=$(DEVICE) \
--policy.path=tests/outputs/diffusion/checkpoints/000002/pretrained_model \
--env.type=pusht \
--env.episode_length=5 \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--device=$(DEVICE)

test-tdmpc-ete-train:
python lerobot/scripts/train.py \
policy=tdmpc \
env=xarm \
env.task=XarmLift-v0 \
dataset_repo_id=lerobot/xarm_lift_medium \
wandb.enable=False \
training.offline_steps=2 \
training.online_steps=0 \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=2 \
device=$(DEVICE) \
training.save_checkpoint=true \
training.save_freq=2 \
training.batch_size=2 \
training.image_transforms.enable=true \
hydra.run.dir=tests/outputs/tdmpc/

test-tdmpc-ete-train-with-online:
python lerobot/scripts/train.py \
env=pusht \
env.gym.obs_type=environment_state_agent_pos \
policy=tdmpc_pusht_keypoints \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=10 \
device=$(DEVICE) \
training.offline_steps=2 \
training.online_steps=20 \
training.save_checkpoint=false \
training.save_freq=10 \
training.batch_size=2 \
training.online_rollout_n_episodes=2 \
training.online_rollout_batch_size=2 \
training.online_steps_between_rollouts=10 \
training.online_buffer_capacity=15 \
eval.use_async_envs=true \
hydra.run.dir=tests/outputs/tdmpc_online/

--policy.type=tdmpc \
--env.type=xarm \
--env.task=XarmLift-v0 \
--env.episode_length=5 \
--dataset.repo_id=lerobot/xarm_lift_medium \
--dataset.image_transforms.enable=true \
--dataset.episodes='[0]' \
--batch_size=2 \
--offline.steps=2 \
--online.steps=0 \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--save_checkpoint=true \
--save_freq=2 \
--log_freq=1 \
--wandb.enable=false \
--device=$(DEVICE) \
--output_dir=tests/outputs/tdmpc/

test-tdmpc-ete-eval:
python lerobot/scripts/eval.py \
-p tests/outputs/tdmpc/checkpoints/000002/pretrained_model \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=8 \
device=$(DEVICE) \

test-default-ete-eval:
python lerobot/scripts/eval.py \
--config lerobot/configs/default.yaml \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=8 \
device=$(DEVICE) \

test-act-pusht-tutorial:
cp examples/advanced/1_train_act_pusht/act_pusht.yaml lerobot/configs/policy/created_by_Makefile.yaml
python lerobot/scripts/train.py \
policy=created_by_Makefile.yaml \
env=pusht \
wandb.enable=False \
training.offline_steps=2 \
eval.n_episodes=1 \
eval.batch_size=1 \
env.episode_length=2 \
device=$(DEVICE) \
training.save_model=true \
training.save_freq=2 \
training.batch_size=2 \
training.image_transforms.enable=true \
hydra.run.dir=tests/outputs/act_pusht/
rm lerobot/configs/policy/created_by_Makefile.yaml
--policy.path=tests/outputs/tdmpc/checkpoints/000002/pretrained_model \
--env.type=xarm \
--env.episode_length=5 \
--env.task=XarmLift-v0 \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--device=$(DEVICE)

# FIXME: currently broken
# test-tdmpc-ete-train-with-online:
# python lerobot/scripts/train.py \
--policy.type=tdmpc \
--env.type=pusht \
--env.obs_type=environment_state_agent_pos \
--env.episode_length=10 \
--dataset.repo_id=lerobot/pusht_keypoints \
--dataset.image_transforms.enable=true \
--dataset.episodes='[0]' \
--batch_size=2 \
--offline.steps=2 \
--online.steps=20 \
--online.rollout_n_episodes=2 \
--online.rollout_batch_size=2 \
--online.steps_between_rollouts=10 \
--online.buffer_capacity=15 \
--online.env_seed=10000 \
--save_checkpoint=false \
--save_freq=10 \
--log_freq=1 \
--eval.use_async_envs=true \
--eval.n_episodes=1 \
--eval.batch_size=1 \
--device=$(DEVICE) \
--output_dir=tests/outputs/tdmpc_online/

# TODO: do we keep this one?
# test-act-pusht-tutorial:
# cp examples/advanced/1_train_act_pusht/act_pusht.yaml lerobot/configs/policy/created_by_Makefile.yaml
# python lerobot/scripts/train.py \
# policy=created_by_Makefile.yaml \
# env=pusht \
# wandb.enable=False \
# training.offline_steps=2 \
# eval.n_episodes=1 \
# eval.batch_size=1 \
# env.episode_length=2 \
# device=$(DEVICE) \
# training.save_model=true \
# training.save_freq=2 \
# training.batch_size=2 \
# training.image_transforms.enable=true \
# hydra.run.dir=tests/outputs/act_pusht/
# rm lerobot/configs/policy/created_by_Makefile.yaml
Loading
Loading