-
Notifications
You must be signed in to change notification settings - Fork 935
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[feat] PL mvp0: training #748
Conversation
a77ae82
to
ba476a2
Compare
Apologies on the delay getting Lightning-AI/pytorch-lightning#4369 merged, a test was added so it should be merged today and added to the release! EDIT: @ytsheng this has been merged! It's been included in our latest release, for Lightning (1.1.6), so just need to |
Thanks so much @SeanNaren, updated the PR to reflect the new version from pytorch lightning. Many thanks! |
f225bb4
to
5cb6c59
Compare
I gave the PR a read-over and looks clean, nice work! I saw some custom builders in a few places for optimizers/schedulers, have you guys thought of using the hydra instantiation methods: https://hydra.cc/docs/next/patterns/instantiate_objects/overview I know hydra instantiations is not to everyones tastes :) Regardless integration looks great! |
Hydra is definitely in our pipeline of things to do! We have it planned for H1 of 2021. Stay tuned. |
loss = report.losses["loss"].detach().cpu().item() | ||
self.assertAlmostEqual(loss, 2.6852, 4) | ||
self.assertAlmostEqual(loss, 4.4688, 4) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this was necessary because I changed the loss calculation to force gradients to be big to test grad clipping.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Super! This is going great. This needs to be imported internally and then TARGETS needed to be added in fbcode before landing. I have left some general comments on design.
92d668f
to
9347a80
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good! Almost there.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me overall. Some comments to address before landing.
|
||
|
||
class LightningLoopCallback(Callback): | ||
def __init__(self, lightning_trainer): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Missing typings.
@ytsheng has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me! Thanks for addressing all the comments.
@ytsheng has updated the pull request. You must reimport the pull request before landing. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.
@ytsheng merged this pull request in 0ee1127. |
Summary: * pytorch lighting stub mostly involving training * Tests for lightning trainer included * built on top of the mmf grad accumulation fix: facebookresearch#747 - [X] MVP 0. Training: Goal - Train a model from scratch and reach similar accuracy as using mmf_trainer - [X] Setup the training pipeline: done - [X] Training on the right device: done - [X] Clip gradients: done - [X] Optimizer: done - [X] FP16 Support: done - [X] LR scheduler (incl. warmup etc): done - [X] testcase: train visual_bert on vqa from scratch fo 10 iterations, compare the value: done - [x] tests included in this PR (tests are only done for pytorch lightning integration): - [X] Vanilla Training w/o grad accumulate, make sure loss for 5 iters are the same between mmf and pl - [X] Optimizer working as intended as a part of this PR. - [X] `max_updates` and `max_epochs` calculation - [x] Training with grad accumulate - [x] Training with LR schedule achieves a different value compared to without LR schedule - [x] Training with LR schedule for PL is the same as training with LR schedule for `mmf_tranier` - [x] Training with gradient clipping make sure all grads are within the `grad_clipping` threshold - [x] Training with gradient clipping is the same as training with gradient clipping for `mmf_trainer` Pull Request resolved: facebookresearch#748 Reviewed By: apsdehal, simran2905 Differential Revision: D26192869 Pulled By: ytsheng fbshipit-source-id: 203a91e893d6b878bbed80ed84960dd059cfc90c
max_updates
andmax_epochs
calculationmmf_tranier
grad_clipping
thresholdmmf_trainer