Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Critics and optimizers are wrongly saved - SAC #570

Closed
4 of 8 tasks
jacekplocharczyk opened this issue Mar 15, 2022 · 7 comments · Fixed by #575
Closed
4 of 8 tasks

Critics and optimizers are wrongly saved - SAC #570

jacekplocharczyk opened this issue Mar 15, 2022 · 7 comments · Fixed by #575
Labels
question Further information is requested

Comments

@jacekplocharczyk
Copy link

jacekplocharczyk commented Mar 15, 2022

  • I have marked all applicable categories:
    • exception-raising bug
    • RL algorithm bug
    • documentation request (i.e. "X is missing from the documentation.")
    • new feature request
  • I have visited the source website
  • I have searched through the issue tracker for duplicates
  • I have mentioned version numbers, operating system and environment, where applicable:
    import tianshou, torch, numpy, sys
    print(tianshou.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
    0.4.5 1.7.1.post2 1.19.5 3.8.10 | packaged by conda-forge | (default, May 11 2021, 07:01:05) 
    [GCC 9.3.0] linux

Hey guys,
I found a bug where networks and optimizers for SAC critics are different when saved compared to unsaved objects.
The issue doesn't occur with actor, actor optimizer and alpha.

Gist with code to replicate the problem:
https://gist.github.com/jacekplocharczyk/a85964c5ad227b88af00dfb1a4dfd769

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Mar 16, 2022

     reloaded_policy.load_state_dict(torch.load(policy_path, map_location=args.device))
+    print((policy.critic1_optim.param_groups[0]["params"][0][0] == reloaded_policy.critic1_optim.param_groups[0]["params"][0][0]).all())
+    print(policy.critic1_optim.param_groups[0]["params"][0][0], reloaded_policy.critic1_optim.param_groups[0]["params"][0][0])
Final reward: -1290.1767704030294, length: 200.0
tensor(True, device='cuda:0')
tensor([ 0.1090, -0.1627, -0.0721, -0.0690], device='cuda:0',
       grad_fn=<SelectBackward0>) tensor([ 0.1090, -0.1627, -0.0721, -0.0690], device='cuda:0',
       grad_fn=<SelectBackward0>)

Did you use the same version of code to generate this model?

@jacekplocharczyk
Copy link
Author

jacekplocharczyk commented Mar 16, 2022

yes, it's the same version of code.
My output is below (I've added print the same as you did).

Observations shape: (3,)
Actions shape: (1,)
Action range: -2.0 2.0
/root/miniconda3/envs/rl/lib/python3.8/site-packages/tianshou/data/collector.py:61: UserWarning: Single environment detected, wrap to DummyVectorEnv.
  warnings.warn("Single environment detected, wrap to DummyVectorEnv.")
Epoch #1: 2it [00:00, 45.47it/s, env_step=1, len=0, loss/actor=0.600, loss/critic1=51.337, loss/critic2=56.436, n/ep=0, n/st=1, rew=0.00]                           
Epoch #1: test_reward: -1344.290911 ± 46.330156, best_reward: -1322.881265 ± 114.243298 in #0
Epoch #2: 2it [00:00, 92.99it/s, env_step=2, len=0, loss/actor=0.879, loss/critic1=44.331, loss/critic2=49.411, n/ep=0, n/st=1, rew=0.00]                           
Epoch #2: test_reward: -1420.681549 ± 97.445743, best_reward: -1322.881265 ± 114.243298 in #0
Epoch #3: 2it [00:00, 73.36it/s, env_step=3, len=0, loss/actor=1.139, loss/critic1=40.423, loss/critic2=45.485, n/ep=0, n/st=1, rew=0.00]                           
Epoch #3: test_reward: -1363.087723 ± 95.064194, best_reward: -1322.881265 ± 114.243298 in #0
Epoch #4: 2it [00:00, 91.21it/s, env_step=4, len=0, loss/actor=1.430, loss/critic1=37.392, loss/critic2=42.584, n/ep=0, n/st=1, rew=0.00]                           
Epoch #4: test_reward: -1346.489505 ± 286.613257, best_reward: -1322.881265 ± 114.243298 in #0
Epoch #5: 2it [00:00, 96.18it/s, env_step=5, len=0, loss/actor=1.690, loss/critic1=34.437, loss/critic2=39.559, n/ep=0, n/st=1, rew=0.00]                           
Epoch #5: test_reward: -1416.217160 ± 266.236718, best_reward: -1322.881265 ± 114.243298 in #0
{'best_result': '-1322.88 ± 114.24',
 'best_reward': -1322.8812650934738,
 'duration': '4.10s',
 'test_episode': 60,
 'test_speed': '3128.53 step/s',
 'test_step': 12000,
 'test_time': '3.84s',
 'train_episode': 0,
 'train_speed': '19.07 step/s',
 'train_step': 5,
 'train_time/collector': '0.01s',
 'train_time/model': '0.25s'}
Final reward: -1331.6370587230358, length: 200.0
tensor(False)
tensor([ 0.1098, -0.1629, -0.0713, -0.0658], grad_fn=<SelectBackward>) tensor([ 0.1052, -0.1665, -0.0758, -0.0703], grad_fn=<SelectBackward>)

Could you check if not using GPU changes the behaviour?

On version 0.4.6.post1 behaviour is the same as on 0.4.5

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Mar 16, 2022

2022-03-16 10-47-08 的屏幕截图

In [1]: import tianshou, gym, torch, numpy, sys
   ...: print(tianshou.__version__, gym.__version__, torch.__version__, numpy.__version__, sys.version, sys.platform)
0.4.6.post1 0.23.1 1.10.2+cu102 1.21.5 3.8.10 (default, Nov 26 2021, 20:14:08) 
[GCC 9.3.0] linux

@jacekplocharczyk
Copy link
Author

jacekplocharczyk commented Mar 16, 2022

You can replicate this using the following Dockerfile

FROM pytorch/pytorch:1.9.0-cuda10.2-cudnn7-runtime

RUN pip install --upgrade tianshou gym pygame

COPY tianshou_loading_example.py .

ENTRYPOINT [ "python", "tianshou_loading_example.py"]

Just put your test.py file in the same dir as the Dockerfile and run

docker build -t critictest .
docker run critictest

You need also change Pendulum-v0 to Pendulum-v1. Also, I've update the gist so you can just take fresh one.

@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Mar 16, 2022

Interesting. torch==1.9.0 gives False and torch==1.9.1 gives True.

  • False:
  • True:
  • random: 1.4.0, 1.5.0, 1.5.1, 1.6.0, 1.7.0, 1.7.1, 1.8.0, 1.8.1, 1.9.0, 1.9.1, 1.10.0, 1.10.1, 1.10.2, 1.11.0

I'll check it later. (probably this weekend)

@Trinkle23897 Trinkle23897 added the bug Something isn't working label Mar 16, 2022
@Trinkle23897
Copy link
Collaborator

Trinkle23897 commented Mar 19, 2022

It's not wrongly saved. There're two arguments: save_fn and save_checkpoint_fn (I know the names are a little bit confusing).

  • save_fn only saves policy when it reaches the highest avg reward ever;
  • save_checkpoint_fn saves policy at the end of each epoch.

So for save_fn, it may be the following case:

epoch1: reward = -1500 (save)
epoch2: reward = -1000 (save)
epoch3: reward = -800 (save)

and the final policy is the best policy.

However, it may also be the following case:

epoch1: reward = -1000 (save)
epoch2: reward = -800 (save)
epoch3: reward = -1200 (do nothing)

and when you load the policy from disk, it is actually from epoch2 instead of epoch3. If you want to use epoch3's result when loading a policy, you should use save_checkpoint_fn instead of save_fn.

save_checkpoint_fn example: (from #350)

def save_checkpoint_fn(epoch, env_step, gradient_step):
# see also: https://pytorch.org/tutorials/beginner/saving_loading_models.html
torch.save(
{
'model': policy.state_dict(),
'optim': optim.state_dict(),
}, os.path.join(log_path, 'checkpoint.pth')
)
pickle.dump(
train_collector.buffer,
open(os.path.join(log_path, 'train_buffer.pkl'), "wb")
)
if args.resume:
# load from existing checkpoint
print(f"Loading agent under {log_path}")
ckpt_path = os.path.join(log_path, 'checkpoint.pth')
if os.path.exists(ckpt_path):
checkpoint = torch.load(ckpt_path, map_location=args.device)
policy.load_state_dict(checkpoint['model'])
policy.optim.load_state_dict(checkpoint['optim'])
print("Successfully restore policy and optim.")
else:
print("Fail to restore policy and optim.")
buffer_path = os.path.join(log_path, 'train_buffer.pkl')
if os.path.exists(buffer_path):
train_collector.buffer = pickle.load(open(buffer_path, "rb"))
print("Successfully restore buffer.")
else:
print("Fail to restore buffer.")

@Trinkle23897 Trinkle23897 added question Further information is requested and removed bug Something isn't working labels Mar 19, 2022
@jacekplocharczyk
Copy link
Author

That should be it. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question Further information is requested
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants