Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

🐛 [BUG] Cannot use training loss as metrics key #213

Closed
peastman opened this issue May 23, 2022 · 2 comments
Closed

🐛 [BUG] Cannot use training loss as metrics key #213

peastman opened this issue May 23, 2022 · 2 comments
Labels
bug Something isn't working

Comments

@peastman
Copy link
Contributor

Describe the bug
The example config file includes this line:

metrics_key: validation_loss                                                       # metrics used for scheduling and saving best model. Options: `set`_`quantity`, set can be either "train" or "validation, "quantity" can be loss or anything that appears in the validation batch step header, such as f_mae, f_rmse, e_mae, e_rmse

Following those instructions, I set the value to train_loss. When I do, it fails with the exception

RuntimeError: metrics_key should start with either validation or training

Apparently it actually wants the value to be training_loss instead of train_loss. But when I change it to that, it fails with a different exception:

KeyError: 'training_loss'

It seems that some parts of the code expect one and other parts expect the other, so that neither works.

Environment (please complete the following information):

  • OS: [e.g. Ubuntu, Windows] Ubuntu 18.04
  • python version (python --version) 3.9
  • python environment (commands are given for python interpreter):
    • nequip version (import nequip; nequip.__version__) 0.5.4
    • e3nn version (import e3nn; e3nn.__version__) 0.4.4
    • pytorch version (import torch; torch.__version__) 1.10.0
  • (if relevant) GPU support with CUDA
    • cuda Version according to nvcc (nvcc --version)
    • cuda version according to PyTorch (import torch; torch.version.cuda)
@peastman peastman added the bug Something isn't working label May 23, 2022
@Linux-cpp-lisp
Copy link
Collaborator

Linux-cpp-lisp commented May 24, 2022

Hi @peastman,

Thanks for the report--- this is a bug arising from an unsupported combination of options. (Using a training key for metrics_key errors out because the first evaluation of the un-trained model on the validation set before training counts as an "epoch" and is trying to access the non-existant training_loss to determine if it is the best_model so far.)

Set:

report_init_validation: False

in your config to disable this first "epoch".

I've added an error to the code so that this fails more informatively.

This resolves the issue in my repro; let me know if you have further issues.

@peastman
Copy link
Contributor Author

Thanks! That works.

@Linux-cpp-lisp Linux-cpp-lisp mentioned this issue Jun 16, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants