Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3 #781

Open
JkAcktuator opened this issue Sep 10, 2023 · 5 comments
Labels
bug Something isn't working

Comments

@JkAcktuator
Copy link

Bug description

Hello,

I want to pass the policy learned from behavioural cloning in imitation library to PPO, I thought it would be successful since they are both from ActorCriticPolicy class, however it doesn't work as I expected.

Steps to reproduce

from stable_baselines3 import PPO
from imitation.algorithms import bc

bc_trainer = bc.BC(
observation_space=env.observation_space,
action_space=env.action_space,
device='cuda',
policy=bc.reconstruct_policy(policy_path, device='cuda'),
rng=rng,
)
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda')

The error is:

Traceback (most recent call last):
File "agent/main.py", line 142, in
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda')
File "/home/repos/stable-baselines3/stable_baselines3/ppo/ppo.py", line 164, in init
self._setup_model()
File "/home/repos/stable-baselines3/stable_baselines3/ppo/ppo.py", line 167, in _setup_model
super()._setup_model()
File "/home/repos/stable-baselines3/stable_baselines3/common/on_policy_algorithm.py", line 120, in _setup_model
self.policy = self.policy_class( # pytype:disable=not-instantiable
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'use_sde'

Environment

  • Operating system and version: Ubuntu 20.04.6 LTS
  • Python version: 3.8.0
  • Pytorch 1.13.0
  • Imitation 0.4.0
  • Stable base line 1.8.0
  • Gym 0.21.0
@JkAcktuator JkAcktuator added the bug Something isn't working label Sep 10, 2023
@yojul
Copy link

yojul commented Sep 27, 2023

Hi, I had the same error trying to retrain a policy with PPO after Behaviour Cloning. Actually, the problem is here :
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda') because the policy argument expects a class (or alternatively a string). The error message is unclear since it is raised by PyTorch.

So when instantiating PPO with sb3, you should pass the policy class you want to use (which should inherit from ActorCriticPolicy). For example :
model = PPO(policy=ActorCriticPolicy, env=env,verbose=1, device = 'cuda')

This should work for instantiating the PPO. However, I am not sure how you should load the pre-trained policy, I could not find the right way to do it in stable-baselines3 (I tried model.policy = bc_trainer.policy but not sure if it works properly).

Hope it helps somehow. Let me know if you find the right way to load a pre-trained policy with PPO algorithm 👍

@AlexGisi
Copy link

For those who stumble across this issue, the load_from_vector method seems to work:

pretrained_policy = ActorCriticPolicy.load("/path/")
model = PPO(ActorCriticPolicy, env)
model.policy.load_from_vector(pretrained_policy.parameters_to_vector())
model.learn(total_timesteps=100_000, reset_num_timesteps=False)

@saeed349
Copy link

saeed349 commented Mar 26, 2024

I had issue with saving and loading BC models and the below worked for me

from stable_baselines3.common import policies
# Saving
bc_model = bc.BC(
    observation_space=venv.observation_space,
    action_space=venv.action_space,
    demonstrations=transitions_custom,
    rng=rng,
)
bc_model.policy.save('models/test/model.zip')

# Loading
pretrained_policy = policies.ActorCriticPolicy.load('models/test/model.zip')
bc_model_reloaded = bc.BC(
    observation_space=venv.observation_space,
    action_space=venv.action_space,
    demonstrations=transitions_custom,
    rng=rng,
    policy = pretrained_policy
)

@saeed349
Copy link

saeed349 commented Mar 27, 2024

I followed the above method mentioned by @yojul to load a BC model in SB3.
But when I retrain the model in SB3 and then save it and try to reload it using PPO.load like below. I am getting an error of shape mismatch when copying weights.
I am guessing this is due to the difference between imitation.policies.base.FeedForward32Policy and stable_baselines3 ActorCriticPolicy.

Can @AlexGisi, @yojul or @JkAcktuator share how you overcome this issue ?

pretrained_policy= policies.ActorCriticPolicy.load("imitation_bc_model.zip")
model = PPO(policy=policies.ActorCriticPolicy,env=env)
model.policy = pretrained_policy

model.learn(total_timesteps=100_000)
model.save("sb3_model.zip")

del model

model = PPO.load(("sb3_model.zip") # this throws an error

@CAI23sbP
Copy link

Hi @saeed349!.
Here is my code.


dense_rollouts = rollout.rollout(
    dense_expert,
    DummyVecEnv([lambda: RolloutInfoWrapper(dense_env)]),
    rollout.make_sample_until(min_timesteps=None, min_episodes=250),
    rng=dense_rng,
)
dense_transitions = rollout.flatten_trajectories(dense_rollouts)

dense_bc = CustomBC(

                    observation_space=dense_env.observation_space,
                    action_space=dense_env.action_space,
                    policy = dense_expert.policy,
                    demonstrations=dense_transitions,
                    rng=dense_rng,
                    device = 'cuda',
                    tensorboard_log = f'/home/cai/Desktop/PILRnav/runs/dense_bc'
                    )
dense_bc.train(n_epochs=10)
dense_bc.policy.save("/home/cai/Desktop/PILRnav/weight/dense_bc")

dense_ppo = PPO(policy='MlpPolicy', 
             env=dense_env, 
             policy_kwargs = policy_kwargs,
             use_sde = False,
            batch_size = 64,
            n_epochs = 7,
            learning_rate= 0.0004,
             tensorboard_log= f'/home/cai/Desktop/PILRnav/runs/dense_ppo' , 
             verbose= 1 )

dense_ppo.policy = dense_ppo.policy.load("/home/cai/Desktop/PILRnav/weight/dense_bc")
dense_ppo.learn(MAX_ITER)
dense_ppo.policy.save("/home/cai/Desktop/PILRnav/weight/dense_ppo")

I hope that it will helpful.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

5 participants