Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Load individual elements if state dict load fails #5213

Merged
merged 22 commits into from
Apr 6, 2021
Merged

Conversation

andrewcoh
Copy link
Contributor

@andrewcoh andrewcoh commented Apr 1, 2021

Proposed change(s)

Addressing the issue of being unable to resume with GAIL/reward providers in general. If the load fails, this will copy matching elements individually and produce a lot of logger warnings as such:

2021-04-01 12:29:22 WARNING [torch_model_saver.py:102] Did not expect these keys ['action_model._continuous_distribution.log_sigma', 'action_model._continuous_distribution.mu.weight', 'action_model._continuous_distribution.mu.bias'] in checkpoint. Initializing
2021-04-01 12:29:22 WARNING [torch_model_saver.py:106] Failed to load for module Optimizer:value_optimizer. Initializing
2021-04-01 12:29:22 WARNING [torch_model_saver.py:98] Did not find these keys ['value_heads.value_heads.curiosity.weight', 'value_heads.value_heads.curiosity.bias', 'value_heads.value_heads.gail.weight', 'value_heads.value_heads.gail.bias'] in checkpoint. Initializing
2021-04-01 12:29:22 WARNING [torch_model_saver.py:102] Did not expect these keys ['value_heads.value_heads.extrinsic.weight', 'value_heads.value_heads.extrinsic.bias', 'value_heads.value_heads.rnd.weight', 'value_heads.value_heads.rnd.bias'] in checkpoint. Initializing
2021-04-01 12:29:22 WARNING [torch_model_saver.py:106] Failed to load for module Module:Curiosity. Initializing
2021-04-01 12:29:22 WARNING [torch_model_saver.py:106] Failed to load for module Module:GAIL. Initializing
2021-04-01 12:29:22 INFO [torch_model_saver.py:116] Resuming training from step 42896.

TODO:

  • tests

Useful links (Github issues, JIRA tickets, ML-Agents forum threads etc.)

Types of change(s)

  • Bug fix
  • New feature
  • Code refactor
  • Breaking change
  • Documentation update
  • Other (please describe)

Checklist

  • Added tests that prove my fix is effective or that my feature works
  • Updated the changelog (if applicable)
  • Updated the documentation (if applicable)
  • Updated the migration guide (if applicable)

Other comments

@ervteng
Copy link
Contributor

ervteng commented Apr 1, 2021

I think this warrants a doc change. In Training-ML-Agents under "Loading an existing model", we can say something like "If the network architecture changes, you may still load an existing model, and ML-Agents will only load the parts of the model that haven't changed. For instance, if you add a new reward signal, the existing model will load but the new reward signal will be initialized from scratch. If you have a model with a visual encoder (CNN) but change the hidden_units, the CNN will be loaded but the body of the network will not be."

docs/Training-ML-Agents.md Outdated Show resolved Hide resolved
andrewcoh and others added 5 commits April 5, 2021 18:24
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Copy link
Contributor

@ervteng ervteng left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add to changlog.md. I'd say this qualifies as a "Major change" for the Python package.

@andrewcoh andrewcoh merged commit ac4f43c into main Apr 6, 2021
@delete-merged-branch delete-merged-branch bot deleted the fix-resume-imi branch April 6, 2021 17:13
ervteng pushed a commit that referenced this pull request Apr 8, 2021
Co-authored-by: Vincent-Pierre BERGES <vincentpierre@unity3d.com>
Co-authored-by: Ervin T. <ervin@unity3d.com>
(cherry picked from commit ac4f43c)
@andrewcoh andrewcoh restored the fix-resume-imi branch June 14, 2021 22:06
@github-actions github-actions bot locked as resolved and limited conversation to collaborators Jun 15, 2022
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants