-
Notifications
You must be signed in to change notification settings - Fork 1.1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Critics and optimizers are wrongly saved - SAC #570
Comments
reloaded_policy.load_state_dict(torch.load(policy_path, map_location=args.device))
+ print((policy.critic1_optim.param_groups[0]["params"][0][0] == reloaded_policy.critic1_optim.param_groups[0]["params"][0][0]).all())
+ print(policy.critic1_optim.param_groups[0]["params"][0][0], reloaded_policy.critic1_optim.param_groups[0]["params"][0][0])
Did you use the same version of code to generate this model? |
yes, it's the same version of code.
Could you check if not using GPU changes the behaviour? On version 0.4.6.post1 behaviour is the same as on |
You can replicate this using the following Dockerfile
Just put your
You need also change Pendulum-v0 to Pendulum-v1. Also, I've update the gist so you can just take fresh one. |
Interesting.
I'll check it later. (probably this weekend) |
It's not wrongly saved. There're two arguments:
So for
and the final policy is the best policy. However, it may also be the following case:
and when you load the policy from disk, it is actually from epoch2 instead of epoch3. If you want to use epoch3's result when loading a policy, you should use
tianshou/test/discrete/test_c51.py Lines 141 to 170 in 10d9190
|
That should be it. Thanks a lot! |
Hey guys,
I found a bug where networks and optimizers for SAC critics are different when saved compared to unsaved objects.
The issue doesn't occur with actor, actor optimizer and alpha.
Gist with code to replicate the problem:
https://gist.github.com/jacekplocharczyk/a85964c5ad227b88af00dfb1a4dfd769
The text was updated successfully, but these errors were encountered: