Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3 #781

JkAcktuator · 2023-09-10T19:11:40Z

Bug description

Hello,

I want to pass the policy learned from behavioural cloning in imitation library to PPO, I thought it would be successful since they are both from ActorCriticPolicy class, however it doesn't work as I expected.

Steps to reproduce

from stable_baselines3 import PPO
from imitation.algorithms import bc

bc_trainer = bc.BC(
observation_space=env.observation_space,
action_space=env.action_space,
device='cuda',
policy=bc.reconstruct_policy(policy_path, device='cuda'),
rng=rng,
)
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda')

The error is:

Traceback (most recent call last):
File "agent/main.py", line 142, in
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda')
File "/home/repos/stable-baselines3/stable_baselines3/ppo/ppo.py", line 164, in init
self._setup_model()
File "/home/repos/stable-baselines3/stable_baselines3/ppo/ppo.py", line 167, in _setup_model
super()._setup_model()
File "/home/repos/stable-baselines3/stable_baselines3/common/on_policy_algorithm.py", line 120, in _setup_model
self.policy = self.policy_class( # pytype:disable=not-instantiable
File "/usr/local/lib/python3.8/dist-packages/torch/nn/modules/module.py", line 1190, in _call_impl
return forward_call(*input, **kwargs)
TypeError: forward() got an unexpected keyword argument 'use_sde'

Environment

Operating system and version: Ubuntu 20.04.6 LTS
Python version: 3.8.0
Pytorch 1.13.0
Imitation 0.4.0
Stable base line 1.8.0
Gym 0.21.0

The text was updated successfully, but these errors were encountered:

yojul · 2023-09-27T10:17:01Z

Hi, I had the same error trying to retrain a policy with PPO after Behaviour Cloning. Actually, the problem is here :
model = PPO(policy=bc_trainer.policy, env=env, verbose=1, device = 'cuda') because the policy argument expects a class (or alternatively a string). The error message is unclear since it is raised by PyTorch.

So when instantiating PPO with sb3, you should pass the policy class you want to use (which should inherit from ActorCriticPolicy). For example :
model = PPO(policy=ActorCriticPolicy, env=env,verbose=1, device = 'cuda')

This should work for instantiating the PPO. However, I am not sure how you should load the pre-trained policy, I could not find the right way to do it in stable-baselines3 (I tried model.policy = bc_trainer.policy but not sure if it works properly).

Hope it helps somehow. Let me know if you find the right way to load a pre-trained policy with PPO algorithm 👍

AlexGisi · 2024-01-24T20:00:21Z

For those who stumble across this issue, the load_from_vector method seems to work:

pretrained_policy = ActorCriticPolicy.load("/path/")
model = PPO(ActorCriticPolicy, env)
model.policy.load_from_vector(pretrained_policy.parameters_to_vector())
model.learn(total_timesteps=100_000, reset_num_timesteps=False)

saeed349 · 2024-03-26T08:15:34Z

I had issue with saving and loading BC models and the below worked for me

from stable_baselines3.common import policies
# Saving
bc_model = bc.BC(
    observation_space=venv.observation_space,
    action_space=venv.action_space,
    demonstrations=transitions_custom,
    rng=rng,
)
bc_model.policy.save('models/test/model.zip')

# Loading
pretrained_policy = policies.ActorCriticPolicy.load('models/test/model.zip')
bc_model_reloaded = bc.BC(
    observation_space=venv.observation_space,
    action_space=venv.action_space,
    demonstrations=transitions_custom,
    rng=rng,
    policy = pretrained_policy
)

saeed349 · 2024-03-27T16:47:34Z

I followed the above method mentioned by @yojul to load a BC model in SB3.
But when I retrain the model in SB3 and then save it and try to reload it using PPO.load like below. I am getting an error of shape mismatch when copying weights.
I am guessing this is due to the difference between imitation.policies.base.FeedForward32Policy and stable_baselines3 ActorCriticPolicy.

Can @AlexGisi, @yojul or @JkAcktuator share how you overcome this issue ?

pretrained_policy= policies.ActorCriticPolicy.load("imitation_bc_model.zip")
model = PPO(policy=policies.ActorCriticPolicy,env=env)
model.policy = pretrained_policy

model.learn(total_timesteps=100_000)
model.save("sb3_model.zip")

del model

model = PPO.load(("sb3_model.zip") # this throws an error

CAI23sbP · 2024-04-24T11:29:14Z

Hi @saeed349!.
Here is my code.


dense_rollouts = rollout.rollout(
    dense_expert,
    DummyVecEnv([lambda: RolloutInfoWrapper(dense_env)]),
    rollout.make_sample_until(min_timesteps=None, min_episodes=250),
    rng=dense_rng,
)
dense_transitions = rollout.flatten_trajectories(dense_rollouts)

dense_bc = CustomBC(

                    observation_space=dense_env.observation_space,
                    action_space=dense_env.action_space,
                    policy = dense_expert.policy,
                    demonstrations=dense_transitions,
                    rng=dense_rng,
                    device = 'cuda',
                    tensorboard_log = f'/home/cai/Desktop/PILRnav/runs/dense_bc'
                    )
dense_bc.train(n_epochs=10)
dense_bc.policy.save("/home/cai/Desktop/PILRnav/weight/dense_bc")

dense_ppo = PPO(policy='MlpPolicy', 
             env=dense_env, 
             policy_kwargs = policy_kwargs,
             use_sde = False,
            batch_size = 64,
            n_epochs = 7,
            learning_rate= 0.0004,
             tensorboard_log= f'/home/cai/Desktop/PILRnav/runs/dense_ppo' , 
             verbose= 1 )

dense_ppo.policy = dense_ppo.policy.load("/home/cai/Desktop/PILRnav/weight/dense_bc")
dense_ppo.learn(MAX_ITER)
dense_ppo.policy.save("/home/cai/Desktop/PILRnav/weight/dense_ppo")

I hope that it will helpful.

JkAcktuator added the bug Something isn't working label Sep 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3 #781

Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3 #781

JkAcktuator commented Sep 10, 2023

yojul commented Sep 27, 2023

AlexGisi commented Jan 24, 2024

saeed349 commented Mar 26, 2024 •

edited

Loading

saeed349 commented Mar 27, 2024 •

edited

Loading

CAI23sbP commented Apr 24, 2024

Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3 #781

Got an unexpected keyword argument 'use_sde' when passing behavioural cloning policy to PPO from SB3 #781

Comments

JkAcktuator commented Sep 10, 2023

Bug description

Steps to reproduce

Environment

yojul commented Sep 27, 2023

AlexGisi commented Jan 24, 2024

saeed349 commented Mar 26, 2024 • edited Loading

saeed349 commented Mar 27, 2024 • edited Loading

CAI23sbP commented Apr 24, 2024

saeed349 commented Mar 26, 2024 •

edited

Loading

saeed349 commented Mar 27, 2024 •

edited

Loading