Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FP16 x gradient clipping #1381

Closed
ZhaofengWu opened this issue Apr 4, 2020 · 19 comments
Closed

FP16 x gradient clipping #1381

ZhaofengWu opened this issue Apr 4, 2020 · 19 comments
Labels
bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Milestone

Comments

@ZhaofengWu
Copy link
Contributor

According to apex docs, we should have a separate case for gradient clipping under fp16. But it seems not to be the case in pytorch-lightning. Is that correct? https://github.com/PyTorchLightning/pytorch-lightning/blob/732eaee4d735dafd1c90728e5583341d75ff72b5/pytorch_lightning/trainer/training_tricks.py#L26

@ZhaofengWu ZhaofengWu added bug Something isn't working help wanted Open to be worked on labels Apr 4, 2020
@github-actions
Copy link
Contributor

github-actions bot commented Apr 4, 2020

Hi! thanks for your contribution!, great first issue!

@Borda
Copy link
Member

Borda commented Apr 5, 2020

@neggert @tullie pls ^^

@tullie
Copy link
Contributor

tullie commented Apr 5, 2020

Should we just fix this when we integrate pytorch mixed precision training #1337 (comment)?

@Borda
Copy link
Member

Borda commented Apr 5, 2020

depends on what will be faster lol

@williamFalcon
Copy link
Contributor

yeah, let's fix with pytorch mixed precision. We need to remove nvidia apex

@mcarilli
Copy link

mcarilli commented Apr 6, 2020

@Borda Borda added this to the 0.7.3 milestone Apr 8, 2020
@Borda Borda modified the milestones: 0.7.4, 0.7.5 Apr 24, 2020
@samgd
Copy link

samgd commented May 1, 2020

This is still an issue in 0.7.5 and it's quite critical for training certain recurrent models where mixed precision can cause a 10x reduction in training time.

Can anyone confirm that this is going to be in the 0.7.6 release? We've been hoping it makes 0.7.3, 0.7.4, 0.7.5...

@williamFalcon
Copy link
Contributor

@samgd we already added this! you need to install the latest pytorch from nightly or a specific commit and lightning will use native amp.

it’s not us you’re waiting for haha, it’s pytorch 1.6. We’re already compatible with native amp. @mcarilli

@samgd
Copy link

samgd commented May 1, 2020

Okay, thanks.

Just to confirm this won't work with 0.7.5 and PyTorch 1.5? What does PyTorch 1.6 add over 1.5, I thought the latter already has native amp?

@williamFalcon
Copy link
Contributor

williamFalcon commented May 1, 2020

1.6 does not use nvidia apex. it uses its internal version. 1.5 does not have native amp

@samgd
Copy link

samgd commented May 1, 2020

Awesome, looking forward to it!

@Borda
Copy link
Member

Borda commented May 1, 2020

@samgd we already added this! you need to install the latest pytorch from nightly or a specific commit and lightning will use native amp.

it’s not us you’re waiting for haha, it’s pytorch 1.6. We’re already compatible with native amp. @mcarilli

maybe we shall consider to fix it also for lower versions as we still keep back-compatibility...

@mcarilli
Copy link

mcarilli commented May 1, 2020

unscale_ before clipping was added in the native amp PR, good. But it might need to be more complicated if there are multiple models and/or optimizers. Does the call to clip_gradients() clip just the gradients owned by the optimizer referred to by optimizer at that point, the gradients owned by a particular model, or all the gradients in the network?

@Borda Borda modified the milestones: 0.7.6, 0.8.0, 0.7.7 May 12, 2020
@Borda Borda modified the milestones: 0.7.7, 0.8.0 May 26, 2020
@Borda Borda modified the milestones: 0.8.0, 0.8.x Jun 17, 2020
@edenlightning edenlightning added the priority: 0 High priority task label Jun 18, 2020
@edenlightning edenlightning modified the milestones: 0.8.x, 0.9.0 Jun 18, 2020
@edenlightning
Copy link
Contributor

@mcarilli mind checking if the issue is still on master?

@edenlightning edenlightning modified the milestones: 0.9.0, 0.9.x Aug 18, 2020
@edenlightning
Copy link
Contributor

@Borda still relevant?

@teddykoker
Copy link
Contributor

Verified clipping works with native fp16

@williamFalcon
Copy link
Contributor

@teddykoker do we have a test to prevent regressions?

@teddykoker
Copy link
Contributor

Don't believe so, I can add one

@williamFalcon
Copy link
Contributor

it it's quick.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working help wanted Open to be worked on priority: 0 High priority task
Projects
None yet
Development

No branches or pull requests

8 participants