Failed to eval finetuned model on `aloha-sim-cube` gym environment #34

nicehiro · 2024-01-15T13:51:49Z

Hi, thanks for your great work!

I have finetuned the model by using examples/02_finetune_new_observation_action.py. And I'm running examples/03_eval_finetuned.py to show the finetuned results.

I followed the instructions

octo/examples/03_eval_finetuned.py

Lines 9 to 11 in 8fe7497

    
           Finally modify the sys.path.append statement below to add the ACT repo to your path and start a virtual display: 
        
               Xvfb :1 -screen 0 1024x768x16 & 
        
               export DISPLAY=:1

and add sys.path.append("/path/to/act"). But still cannot make gym.make("aloha-sim-cube-v0") successful.

Another problem is that I cannot successfully load the finetuned model. Here's the backtrace.

Traceback (most recent call last):
  File "/code/octo/examples/03_eval_finetuned.py", line 101, in <module>
    app.run(main)
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/absl/app.py", line 308, in run
    _run_main(main, args)
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/absl/app.py", line 254, in _run_main
    sys.exit(main(argv))
  File "/code/octo/examples/03_eval_finetuned.py", line 35, in main
    model = OctoModel.load_pretrained(FLAGS.finetuned_path)
  File "/code/octo/octo/model/octo_model.py", line 274, in load_pretrained
    params = checkpointer.restore(step, params_shape)
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/checkpoint_manager.py", line 550, in restore
    restored_items = self._restore_impl(
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/checkpoint_manager.py", line 582, in _restore_impl
    restored[item_name] = self._checkpointers[item_name].restore(
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/checkpointer.py", line 165, in restore
    restored = self._restore_with_args(directory, *args, **kwargs)
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/checkpointer.py", line 103, in _restore_with_args
    restored = self._handler.restore(directory, args=ckpt_args)
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/pytree_checkpoint_handler.py", line 1063, in restore
    restored_item = _transform_checkpoint(
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/pytree_checkpoint_handler.py", line 601, in _transform_checkpoint
    item = utils.deserialize_tree(restored, item)
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/utils.py", line 281, in deserialize_tree
    return jax.tree_util.tree_map_with_path(
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/jax/_src/tree_util.py", line 857, in tree_map_with_path
    return treedef.unflatten(f(*xs) for xs in zip(*all_keypath_leaves))
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/jax/_src/tree_util.py", line 857, in <genexpr>
    return treedef.unflatten(f(*xs) for xs in zip(*all_keypath_leaves))
  File "/opt/conda/envs/octo/lib/python3.10/site-packages/orbax/checkpoint/utils.py", line 278, in _reconstruct_from_keypath
    result = result[key_name]
KeyError: 'diffusion_model'

It looks like I didn't save the diffusion model in the training process. Did I miss something in the configuration?

Thanks.

The text was updated successfully, but these errors were encountered:

kpertsch · 2024-01-19T23:37:58Z

Thanks for giving the model a try!
Sorry about the issues with the eval_finetuned example -- it seems that some lines got deleted in our cleanup. Should hopefully be fixed in #40
Once it's merged, can you try again to gym.make the environment?

For the model loading: it's surprising that it tries to load a key "diffusion_model" since the 02_finetune_new_observation_action.py example replaces the diffusion head with an L1 head, so there should be no more diffusion in the model. Can you inspect the config saved alongside the finetuned model checkpoint and see whether it correctly replaced the diffusion head with the L1 head or whether there is any other diffusion head in there? Just to make sure: you set the finetuned_path argument to where the finetuning checkpoint from example (2) was saved, correct?

nicehiro · 2024-01-20T07:07:36Z

Once it's merged, can you try again to gym.make the environment?

Yes. I'd like to.

Just to make sure: you set the finetuned_path argument to where the finetuning checkpoint from example (2) was saved, correct?

Yes. I'm using the following command, where /output/finetuned_model is the saved finetuned model.

python examples/03_eval_finetuned.py --finetuned_path="/output/finetuned_model"

The action_head in config.json is:

safsin · 2024-03-05T05:56:36Z

I'm able to import sim_env, but the example 03_eval_finetuned throws the KeyError: 'proprio' in line 328, gym_wrappers.py.

On changing line 72 in 03_eval_finetuned.py to ...model.dataset_statistics['bridge_dataset']..., it throws the ValueError: operands could not be broadcast together with shapes (1, 14) (8, ). I get the same error on trying the other datasets. Please help with running this example code.

BUAAZhangHaonan · 2024-04-09T07:00:59Z

I'm able to import sim_env, but the example 03_eval_finetuned throws the KeyError: 'proprio' in line 328, gym_wrappers.py.

On changing line 72 in 03_eval_finetuned.py to ...model.dataset_statistics['bridge_dataset']..., it throws the ValueError: operands could not be broadcast together with shapes (1, 14) (8, ). I get the same error on trying the other datasets. Please help with running this example code.

I encountered the same problem, my device did not have enough GPU memory to fine-tune on the aloha environment, and I did not get the results after fine-tuning. So I am not sure if it is caused by not doing inference on the results of fine-tuning. But I checked the dataset_statistics.json file and found that the proprio of all datasets has 8 dimensions, so I think it should also have 8 dimensions after fine-tuning. You can see the config after fine-tuning from this issue #42 (comment), it shows that the action_dim is 14 instead of 8.

kpertsch · 2024-04-09T16:37:02Z

Yes, ALOHA is a bimanual setup so its action space is 14-dimensional. Our pre-training data is all single-arm data with an 8-dimensional action space.
So you can only evaluate the Octo model on the ALOHA setup after fine-tuning since we need to train a new action head with the correct action dimensionality.

Gym Wrappers

WenchangGaoT pushed a commit to WenchangGaoT/octo1 that referenced this issue May 10, 2024

Merge pull request octo-models#34 from rail-berkeley/eval_wrappers

6f150f4

Gym Wrappers

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Failed to eval finetuned model on `aloha-sim-cube` gym environment #34

Failed to eval finetuned model on `aloha-sim-cube` gym environment #34

nicehiro commented Jan 15, 2024

kpertsch commented Jan 19, 2024

nicehiro commented Jan 20, 2024

safsin commented Mar 5, 2024 •

edited

Loading

BUAAZhangHaonan commented Apr 9, 2024 •

edited

Loading

kpertsch commented Apr 9, 2024

Failed to eval finetuned model on aloha-sim-cube gym environment #34

Failed to eval finetuned model on aloha-sim-cube gym environment #34

Comments

nicehiro commented Jan 15, 2024

kpertsch commented Jan 19, 2024

nicehiro commented Jan 20, 2024

safsin commented Mar 5, 2024 • edited Loading

BUAAZhangHaonan commented Apr 9, 2024 • edited Loading

kpertsch commented Apr 9, 2024

Failed to eval finetuned model on `aloha-sim-cube` gym environment #34

Failed to eval finetuned model on `aloha-sim-cube` gym environment #34

safsin commented Mar 5, 2024 •

edited

Loading

BUAAZhangHaonan commented Apr 9, 2024 •

edited

Loading