[feat] PL mvp0: training #748

hackgoofer · 2021-01-23T04:14:21Z

pytorch lighting stub mostly involving training
Tests for lightning trainer included
built on top of the mmf grad accumulation fix: [fix] fix gradient accumulation when update_frequency != 1 #747

SeanNaren · 2021-01-26T12:19:14Z

Apologies on the delay getting Lightning-AI/pytorch-lightning#4369 merged, a test was added so it should be merged today and added to the release!

EDIT: @ytsheng this has been merged! It's been included in our latest release, for Lightning (1.1.6), so just need to pip install pytorch-lightning -U

hackgoofer · 2021-01-26T22:52:27Z

Thanks so much @SeanNaren, updated the PR to reflect the new version from pytorch lightning. Many thanks!

SeanNaren · 2021-01-28T19:24:33Z

I gave the PR a read-over and looks clean, nice work! I saw some custom builders in a few places for optimizers/schedulers, have you guys thought of using the hydra instantiation methods: https://hydra.cc/docs/next/patterns/instantiate_objects/overview

I know hydra instantiations is not to everyones tastes :) Regardless integration looks great!

hackgoofer · 2021-01-29T02:10:29Z

I gave the PR a read-over and looks clean, nice work! I saw some custom builders in a few places for optimizers/schedulers, have you guys thought of using the hydra instantiation methods: https://hydra.cc/docs/next/patterns/instantiate_objects/overview

I know hydra instantiations is not to everyones tastes :) Regardless integration looks great!

Hydra is definitely in our pipeline of things to do! We have it planned for H1 of 2021. Stay tuned.

hackgoofer · 2021-01-30T06:17:05Z

tests/trainers/test_training_loop.py

 loss = report.losses["loss"].detach().cpu().item()
- self.assertAlmostEqual(loss, 2.6852, 4)
+ self.assertAlmostEqual(loss, 4.4688, 4)


this was necessary because I changed the loss calculation to force gradients to be big to test grad clipping.

apsdehal

Super! This is going great. This needs to be imported internally and then TARGETS needed to be added in fbcode before landing. I have left some general comments on design.

mmf/configs/defaults.yaml

mmf/datasets/base_dataset.py

mmf/datasets/lightning_datamodule.py

mmf/models/base_model.py

mmf/trainers/lightning_trainer.py

tests/lightning/test_lightning_trainer.py

facebook-github-bot

@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

apsdehal

Looking good! Almost there.

mmf/datasets/lightning_datamodule.py

mmf/trainers/lightning_core/loop_callback.py

mmf/trainers/lightning_trainer.py

vedanuj

Looks good to me overall. Some comments to address before landing.

vedanuj · 2021-02-03T08:43:33Z

mmf/trainers/lightning_core/loop_callback.py

+
+
+class LightningLoopCallback(Callback):
+ def __init__(self, lightning_trainer):


Missing typings.

mmf/trainers/lightning_core/loop_callback.py

mmf/trainers/lightning_trainer.py

tests/lightning/__init__.py

facebook-github-bot · 2021-02-04T04:23:12Z

@ytsheng has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot

@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

vedanuj

Looks good to me! Thanks for addressing all the comments.

facebook-github-bot · 2021-02-08T09:40:27Z

@ytsheng has updated the pull request. You must reimport the pull request before landing.

facebook-github-bot

@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot

@ytsheng has imported this pull request. If you are a Facebook employee, you can view this diff on Phabricator.

facebook-github-bot · 2021-02-08T10:52:36Z

@ytsheng merged this pull request in 0ee1127.

Summary: * pytorch lighting stub mostly involving training * Tests for lightning trainer included * built on top of the mmf grad accumulation fix: facebookresearch#747 - [X] MVP 0. Training: Goal - Train a model from scratch and reach similar accuracy as using mmf_trainer - [X] Setup the training pipeline: done - [X] Training on the right device: done - [X] Clip gradients: done - [X] Optimizer: done - [X] FP16 Support: done - [X] LR scheduler (incl. warmup etc): done - [X] testcase: train visual_bert on vqa from scratch fo 10 iterations, compare the value: done - [x] tests included in this PR (tests are only done for pytorch lightning integration): - [X] Vanilla Training w/o grad accumulate, make sure loss for 5 iters are the same between mmf and pl - [X] Optimizer working as intended as a part of this PR. - [X] `max_updates` and `max_epochs` calculation - [x] Training with grad accumulate - [x] Training with LR schedule achieves a different value compared to without LR schedule - [x] Training with LR schedule for PL is the same as training with LR schedule for `mmf_tranier` - [x] Training with gradient clipping make sure all grads are within the `grad_clipping` threshold - [x] Training with gradient clipping is the same as training with gradient clipping for `mmf_trainer` Pull Request resolved: facebookresearch#748 Reviewed By: apsdehal, simran2905 Differential Revision: D26192869 Pulled By: ytsheng fbshipit-source-id: 203a91e893d6b878bbed80ed84960dd059cfc90c

facebook-github-bot added the CLA Signed Do not delete this pull request or issue due to inactivity. label Jan 23, 2021

hackgoofer mentioned this pull request Jan 23, 2021

[feat] pytorch lightning integration - training #734

Closed

8 tasks

hackgoofer force-pushed the pl_3 branch 2 times, most recently from a77ae82 to ba476a2 Compare January 26, 2021 04:54

hackgoofer force-pushed the pl_mvp0 branch from cb3beda to 8faafde Compare January 26, 2021 05:04

hackgoofer force-pushed the pl_3 branch from ba476a2 to 7691957 Compare January 27, 2021 03:27

hackgoofer force-pushed the pl_mvp0 branch 2 times, most recently from f225bb4 to 5cb6c59 Compare January 27, 2021 06:03

hackgoofer requested review from apsdehal and vedanuj and removed request for apsdehal January 27, 2021 06:04

hackgoofer force-pushed the pl_3 branch from 7691957 to 641c1b7 Compare January 28, 2021 02:27

hackgoofer force-pushed the pl_mvp0 branch from 5cb6c59 to 5b2901f Compare January 29, 2021 02:15

hackgoofer changed the title ~~[feat] pytorch lightning integration - training~~ [feat] PL mvp0 - training Jan 29, 2021

hackgoofer changed the title ~~[feat] PL mvp0 - training~~ [feat] PL mvp0: training Jan 29, 2021

hackgoofer force-pushed the pl_3 branch from 641c1b7 to 50442d3 Compare January 30, 2021 02:12

hackgoofer force-pushed the pl_mvp0 branch from 5b2901f to debf307 Compare January 30, 2021 03:14

hackgoofer changed the base branch from pl_3 to master January 30, 2021 03:15

hackgoofer commented Jan 30, 2021

View reviewed changes

apsdehal reviewed Feb 1, 2021

View reviewed changes

hackgoofer force-pushed the pl_mvp0 branch 4 times, most recently from 92d668f to 9347a80 Compare February 2, 2021 03:24

facebook-github-bot reviewed Feb 2, 2021

View reviewed changes

hackgoofer force-pushed the pl_mvp0 branch from 9347a80 to fabe9c6 Compare February 2, 2021 04:13

facebook-github-bot reviewed Feb 2, 2021

View reviewed changes

hackgoofer force-pushed the pl_mvp0 branch from fabe9c6 to 50fc7ba Compare February 2, 2021 05:10

facebook-github-bot reviewed Feb 2, 2021

View reviewed changes

apsdehal reviewed Feb 3, 2021

View reviewed changes

vedanuj reviewed Feb 3, 2021

View reviewed changes

hackgoofer force-pushed the pl_mvp0 branch from 50fc7ba to c4a31f1 Compare February 4, 2021 04:23

facebook-github-bot reviewed Feb 4, 2021

View reviewed changes

vedanuj approved these changes Feb 5, 2021

View reviewed changes

[feat] pytorch lightning integration - training

b94107e

hackgoofer force-pushed the pl_mvp0 branch from c4a31f1 to b94107e Compare February 8, 2021 09:40

facebook-github-bot reviewed Feb 8, 2021

View reviewed changes

facebook-github-bot closed this in 0ee1127 Feb 8, 2021

facebook-github-bot added the Merged label Feb 8, 2021

hackgoofer deleted the pl_mvp0 branch February 8, 2021 20:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[feat] PL mvp0: training #748

[feat] PL mvp0: training #748

hackgoofer commented Jan 23, 2021 •

edited

Loading

SeanNaren commented Jan 26, 2021 •

edited

Loading

hackgoofer commented Jan 26, 2021 •

edited

Loading

SeanNaren commented Jan 28, 2021

hackgoofer commented Jan 29, 2021

hackgoofer Jan 30, 2021 •

edited

Loading

apsdehal left a comment

facebook-github-bot left a comment

facebook-github-bot left a comment

facebook-github-bot left a comment

apsdehal left a comment

vedanuj left a comment

vedanuj Feb 3, 2021

facebook-github-bot commented Feb 4, 2021

facebook-github-bot left a comment

vedanuj left a comment

facebook-github-bot commented Feb 8, 2021

facebook-github-bot left a comment

facebook-github-bot left a comment

facebook-github-bot commented Feb 8, 2021



		class LightningLoopCallback(Callback):
		def __init__(self, lightning_trainer):

[feat] PL mvp0: training #748

[feat] PL mvp0: training #748

Conversation

hackgoofer commented Jan 23, 2021 • edited Loading

SeanNaren commented Jan 26, 2021 • edited Loading

hackgoofer commented Jan 26, 2021 • edited Loading

SeanNaren commented Jan 28, 2021

hackgoofer commented Jan 29, 2021

hackgoofer Jan 30, 2021 • edited Loading

Choose a reason for hiding this comment

apsdehal left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

apsdehal left a comment

Choose a reason for hiding this comment

vedanuj left a comment

Choose a reason for hiding this comment

vedanuj Feb 3, 2021

Choose a reason for hiding this comment

facebook-github-bot commented Feb 4, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

vedanuj left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 8, 2021

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot left a comment

Choose a reason for hiding this comment

facebook-github-bot commented Feb 8, 2021

hackgoofer commented Jan 23, 2021 •

edited

Loading

SeanNaren commented Jan 26, 2021 •

edited

Loading

hackgoofer commented Jan 26, 2021 •

edited

Loading

hackgoofer Jan 30, 2021 •

edited

Loading