Skip to content
This repository has been archived by the owner on Nov 3, 2023. It is now read-only.

Minor FP16 fixes when converting old checkpoints #3514

Merged
merged 4 commits into from
Mar 22, 2021
Merged

Conversation

stephenroller
Copy link
Contributor

Patch description

  • Don't throw an error if we load a checkpoint that had an APEX scaler
  • Avoid saving an optimizer with BART in case the user decides to initialize with -opt sgd. Lowers disk usage slightly

Testing steps
Manual testing

@stephenroller stephenroller requested a review from spencerp March 12, 2021 14:47
@stephenroller stephenroller changed the title Minor FP16 fixes Minor FP16 fixes when converting old checkpoints Mar 12, 2021
Copy link
Contributor

@spencerp spencerp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Commented on something that feels a little strange to me, but I'm not very familiar with this code path so feel free to ignore if this is in fact the most direct way of not saving the optimizer.

@@ -168,6 +168,7 @@ def get_parlai_opt(self) -> Opt:
# 3. Map other options
transformer_common_config.update(
{
'datatype': 'valid', # don't save an optimizer
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a little weird as it doesn't seem obvious that saving datatype as valid will always exclude only the optimizer. And it's not obvious that there won't be any other effects.

Is it possible to just explicitly remove the optimizer in load_fairseq_checkpoint?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll do a post vacuum

@stephenroller stephenroller merged commit 35b3156 into master Mar 22, 2021
@stephenroller stephenroller deleted the fp16_fix branch March 22, 2021 22:26
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants