Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TDMPC Error #67

Open
ErinTUDelft opened this issue Apr 9, 2024 · 4 comments
Open

TDMPC Error #67

ErinTUDelft opened this issue Apr 9, 2024 · 4 comments

Comments

@ErinTUDelft
Copy link

ErinTUDelft commented Apr 9, 2024

Thank you for creating this library, it really is amazing! However, when running train.py (algo=tdmpc, task=Hover) from /scripts I get the following error:

Traceback (most recent call last):
  File "/home/erin/Thesis/OmniDrones/scripts/train.py", line 104, in main
    policy = ALGOS[cfg.algo.name.lower()](
TypeError: TDMPCPolicy.__init__() got multiple values for argument 'device'

When running the same command from /scripts_paper I get this error:

 File "/home/erin/anaconda3/envs/OmniDrones310/lib/python3.10/copy.py", line 161, in deepcopy
  File "/home/erin/Thesis/OmniDrones/scripts_paper/train.py", line 172, in main
    collector = SyncDataCollector(
  File "/home/erin/Thesis/Utils/rl/torchrl/collectors/collectors.py", line 539, in __init__
    (self.policy, self.get_weights_fn,) = self._get_policy_and_device(
  File "/home/erin/Thesis/Utils/rl/torchrl/collectors/collectors.py", line 178, in _get_policy_and_device
    policy = deepcopy(policy)
----------
  File "/home/erin/anaconda3/envs/OmniDrones310/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle 'omni.kit.app._app.IApp' object

Which of the two training files would you recommend using? And how can I resolve this issue?
Kind regards,
Erin

@btx0424
Copy link
Owner

btx0424 commented Apr 10, 2024

Really sorry for the delayed reply. I have been busy with another project lately.

Unfortunately, we found TDMPC (our implementation) performs poorly on tasks beyond hovering and simple tracking. The critical issue is that planning is too time-consuming to be used in a massively parallelized setting. Therefore the algorithm did not appear in the paper and is no longer maintained. I would suggest not using it unless you have specific interest in model-based methods.

@ErinTUDelft
Copy link
Author

Oh that is a shame to hear, I was quite excited to try this algorithm! In their original paper they do mention the following concerning the planning though:

"For example, we found in Figure 8 that we can reduce the planning horizon of TD-MPC on the Quadruped Run task from 5 to 1 with no significant reduction in performance, which reduces inference time to approximately 12ms per step"

Was using a horizon length of 1 instead of 5 something you have tried? Their new TDMPC2 also claims to be faster and as I'm doing a side project on model-based RL I would still like to try and implement it! Have there been many backward compatibility breaking changes in the meantime or would it still be possible to run the tdmpc algorithm?

@btx0424
Copy link
Owner

btx0424 commented Apr 10, 2024

I think the main issue is having to perform CEM planning for all the parallel environments. For example, you need to sample 4096 * num_samples * planning_horizon actions and rollout the learned model at each step.

However, I do think the capability of collecting a large amount of data is favorable for model-based RL. It's just some modifications are needed.

The algorithm implemented is no longer compatible. But it should not be too hard to make it run again, since the data collection interface remains the same. Possibly you just need to change some keys of the tensordicts.

@Sgmltd
Copy link

Sgmltd commented Jul 16, 2024

TypeError: TDMPCPolicy.init() got multiple values for argument 'device'

@ErinTUDelft Hello, have you now solved the related bug about the tdmpc algorithm? I've run other algorithms and I'm getting the same error as yours:
TypeError: XXXXPolicy.__init__() got multiple values for argument 'device'

How did you get this algorithm to work? Looking forward to your answers. Thanks!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants