Feature/vla 2 #583

mshukor · 2024-12-16T21:44:11Z

What this does

Fix the reward 0 bug (due to not normalizing the targets)
Support encoder and decoder for ACT
Support robot states as input to the action decoder
Some features related to loading hf models

How it was tested


ENV=aloha
ENV_TASK=AlohaTransferCube-v0
dataset_repo_id=lerobot/aloha_sim_transfer_cube_human


policy=vla
LR=3e-5 #1e-5
LR_SCHEDULER=
USE_AMP=true
PRECISION=fp16

ASYNC_ENV=false

FEAT_SELECT=all_generated


VLM=google/paligemma2-3b-pt-224
VLM_NAME=paligemma2_3b
VLM_DIM=2304
NUM_IMG_TOKENS=598

USE_PROMNPT_TEMPLATE=false

ACTION_DECODER=act_decoder

DIM_MODEL=512
LORA_R=4

PEFT_METHOD=lora


USE_ACTION_CONNECTOR=true

TASK_NAME=lerobot_${ENV}_transfer_cube_${policy}_${ACTION_DECODER}_${VLM_NAME}_${PEFT_METHOD}_feat_select_${FEAT_SELECT}

GPUS=1
EVAL_FREQ=5000 #51000 #10000 51000
OFFLINE_STEPS=100000 #25000 17000 12500 50000
TRAIN_BATCH_SIZE=8
EVAL_BATCH_SIZE=8

SAVE_FREQ=5000


MUJOCO_GL=egl python lerobot/scripts/train.py \
 hydra.job.name=base_distributed_aloha_transfer_cube \
 hydra.run.dir=$WORK/logs/lerobot/${TASK_NAME} \
 dataset_repo_id=$dataset_repo_id \
 policy=$policy \
 env=$ENV env.task=$ENV_TASK \
 training.offline_steps=$OFFLINE_STEPS training.batch_size=$TRAIN_BATCH_SIZE training.save_freq=$SAVE_FREQ \
 training.eval_freq=$EVAL_FREQ eval.n_episodes=50 eval.use_async_envs=$ASYNC_ENV eval.batch_size=$EVAL_BATCH_SIZE \
 training.lr=$LR training.lr_backbone=$LR \
 wandb.enable=false use_amp=$USE_AMP precision=$PRECISION \
 policy.vlm_backbone.feature_selection=$FEAT_SELECT policy.vlm_backbone.name=$VLM policy.action_decoder.dim_model=$DIM_MODEL \
 policy.use_prompt_template=$USE_PROMNPT_TEMPLATE  policy.num_img_tokens=$NUM_IMG_TOKENS policy.peft_config.r=$LORA_R policy.peft_method=$PEFT_METHOD \
 policy.use_action_connector=$USE_ACTION_CONNECTOR policy.vlm_backbone.hidden_size=$VLM_DIM  policy.action_decoder.name=$ACTION_DECODER

How to checkout & try? (for the reviewer)

danaaubakirova

Thanks for this PR! Good catch on the bug and nicely done with adding observation.states as an input! If you could explain some design choices and the code works after testing/inference, this will be approved 👍

danaaubakirova · 2024-12-19T07:42:18Z

lerobot/common/policies/act/modeling_act.py

@@ -513,6 +514,49 @@ def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, tuple[Tensor, Tenso
        return actions, (mu, log_sigma_x2)


+class ACTEncoderDecoder(nn.Module):


Could you please share the motivation behind introducing the ACTEncoderDecoder module?

At first I wanted to see if the 0 reward is coming from small action decoder. I think we should compare different design choices for the action decoder and this makes it a bit easier for ACT (ACT decoder only vs ACT encoder decoder)

danaaubakirova · 2024-12-19T08:05:10Z

lerobot/common/policies/vla/modeling_vla.py

-            self.action_decoder = ACTDecoder(action_decoder_config)
-            self.decoder_pos_embed = nn.Embedding(config.chunk_size, config.action_decoder["dim_model"])
+        self.use_robot_state = "observation.state" in config.input_shapes
+        if "act" in self.action_decoder_name:


Is this related to the new ActEncoderDecoder?

To me this if-else block related to act looks a bit confusing, we could rather rename it completely or structure in this way

if "act" in self.action_decoder_name: action_decoder_config = OmegaConf.create(config.action_decoder) if self.action_decoder_name == "act_decoder": # Use standalone ACTDecoder self.action_decoder = ACTDecoder(action_decoder_config) self.decoder_pos_embed = nn.Embedding(config.chunk_size, config.action_decoder["dim_model"]) else: # Use ACTEncoderDecoder, decide whether to include the encoder use_encoder = "decoder" not in self.action_decoder_name self.action_decoder = ACTEncoderDecoder(action_decoder_config, use_encoder=use_encoder)

Or even better: if ActDecoder is equivalent to ActEncoderDecoder with use_encoder=False then we can remove the use of ActDecoder completely and leave only the option use_encoder for act to avoid confusion and redundancy.

Yes, ACTEncoderDecoder is more general but I kept it to be able your old checkpoints without changing its keys. If this is not needed anymore (e.g. we trained a good VLA with this branch) we should keep only the ACTEncoderDecoder

danaaubakirova · 2024-12-19T08:12:41Z

lerobot/configs/policy/vla.yaml

@@ -1,14 +1,6 @@
-# @package _global_


Adding this line helped avoid explicitly using the name attribute (e.g., policy.name = vla or env.name = aloha), which was causing various issues with override_dataset_stats. The exact reason for these issues is still unclear to me. If you have any idea why this happens, I’d be happy to learn more.

Not sure, but when you add this line you make the parameters global and hence you can access them directly. For example policy.n_obs_steps instead of e.g. policy.vla.policy.n_obs_steps

danaaubakirova · 2024-12-19T08:14:42Z

lerobot/common/policies/vla/modeling_vla.py

@@ -91,6 +92,7 @@ def forward(self, batch: dict[str, torch.Tensor]) -> dict[str, torch.Tensor]:
        if len(self.expected_image_keys) > 0:
            batch = dict(batch)
            batch["observation.images"] = [img for k in self.expected_image_keys for img in batch[k]]
+        batch = self.normalize_targets(batch)


I am still testing the code to verify that the reward 0 was indeed related to not normalizing targets.

mshukor and others added 29 commits December 6, 2024 10:12

apply prompt template

281fbc0

ignore slurm

76bd088

testing

4f76e4d

launch offline mode

1d7d23a

refactor lora, vlm

0d02645

refactor the action head and forward

1831104

feature selection

f3e131c

fix config

d4573fe

fix transpose

ab3a9d6

fp16

c6add82

support paligemma

3d8b83d

precommit

91a557a

clean files

b9976e9

precommit

ffd02f2

add <image>

96619c9

merge with upstream

7a8d30b

remove unnecessary params

b85b624

remove unnecessary params

38c41e6

add freeze mode

09ae8bd

robot state and action connectors

e101780

resnet as vlm

d0b4152

act enc dec

832df36

act enc dec fix

44619a8

fix normalize output

9166098

bfloat16

06706e9

items() for VLAConfig

3a093ca

nit

18a959a

cleaning

7c4a3e5

cleaning

021e786

danaaubakirova self-requested a review December 17, 2024 18:08

fix config vla

315f8b2

danaaubakirova reviewed Dec 19, 2024

View reviewed changes

fix pyproject.toml and reinstall

08a7efb

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Feature/vla 2 #583

Feature/vla 2 #583

mshukor commented Dec 16, 2024 •

edited

Loading

danaaubakirova left a comment

danaaubakirova Dec 19, 2024

mshukor Dec 22, 2024

danaaubakirova Dec 19, 2024

mshukor Dec 22, 2024

danaaubakirova Dec 19, 2024

mshukor Dec 22, 2024

danaaubakirova Dec 19, 2024

		@@ -513,6 +514,49 @@ def forward(self, batch: dict[str, Tensor]) -> tuple[Tensor, tuple[Tensor, Tenso
		return actions, (mu, log_sigma_x2)


		class ACTEncoderDecoder(nn.Module):

Feature/vla 2 #583

Are you sure you want to change the base?

Feature/vla 2 #583

Conversation

mshukor commented Dec 16, 2024 • edited Loading

What this does

How it was tested

How to checkout & try? (for the reviewer)

danaaubakirova left a comment

Choose a reason for hiding this comment

danaaubakirova Dec 19, 2024

Choose a reason for hiding this comment

mshukor Dec 22, 2024

Choose a reason for hiding this comment

danaaubakirova Dec 19, 2024

Choose a reason for hiding this comment

mshukor Dec 22, 2024

Choose a reason for hiding this comment

danaaubakirova Dec 19, 2024

Choose a reason for hiding this comment

mshukor Dec 22, 2024

Choose a reason for hiding this comment

danaaubakirova Dec 19, 2024

Choose a reason for hiding this comment

mshukor commented Dec 16, 2024 •

edited

Loading