-
Notifications
You must be signed in to change notification settings - Fork 29
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977
base: master
Are you sure you want to change the base?
Conversation
9e9e5e2
to
8ab466d
Compare
…ion with "optimizer.step()" and "loss.backward()"
8ab466d
to
1eb2c28
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The rule looks good, I have a few comments about the classification of the rule and some assumptions on what users might do in the training loop.
rules/S6986/python/metadata.json
Outdated
@@ -0,0 +1,25 @@ | |||
{ | |||
"title": "\"optimizer.zero_grad()\" should be used in conjunction with \"optimizer.step()\" and \"loss.backward()\"", | |||
"type": "CODE_SMELL", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would probably go with BUG. If the gradients are never zeroed, they will eventually explode or just get too big and wreck the weights
rules/S6986/python/metadata.json
Outdated
"impacts": { | ||
"RELIABILITY": "HIGH" | ||
}, | ||
"attribute": "CONVENTIONAL" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it is more complete or logical
rules/S6986/python/rule.adoc
Outdated
This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`. | ||
== Why is this an issue? | ||
|
||
In PyTorch the training loop of a neural network is comprised of a several steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We should probably indicate that the steps are not necessarily in order. You can zero_grad before the whole loop. Or you can do many zero_grad in one epoch (minibatches). Or you can do it every few minibatches (gradient accumulation, allows you to have bigger batch size without taking up more (V)RAM)
rules/S6986/python/rule.adoc
Outdated
|
||
== How to fix it | ||
|
||
To fix the issue call the `optimizer.zero_grad()` method. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add a comma
rules/S6986/python/rule.adoc
Outdated
import torch | ||
from my_data import data | ||
|
||
loss_fn = torch.nn.CrossEntropyLoss() | ||
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) | ||
|
||
for epoch in range(100): | ||
for i in range(len(data)): | ||
output = model(data[i]) | ||
loss = loss_fn(output, labels[i]) | ||
loss.backward() | ||
optimizer.step() # Noncompliant: optimizer.zero_grad() was not called in the training loop | ||
---- | ||
|
||
==== Compliant solution | ||
|
||
[source,python,diff-id=1,diff-type=compliant] | ||
---- | ||
import torch | ||
from my_data import data, labels | ||
|
||
loss_fn = torch.nn.CrossEntropyLoss() | ||
optimizer = torch.optim.SGD(model.parameters(), lr=0.01) | ||
|
||
for epoch in range(100): | ||
for i in range(len(data)): | ||
optimizer.zero_grad() | ||
output = model(data[i]) | ||
loss = loss_fn(output, labels[i]) | ||
loss.backward() | ||
optimizer.step() # Compliant |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The exemples should probably be using Dataset/Dataloaders, to highlight the right way to do things
rules/S6986/python/rule.adoc
Outdated
This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`. | ||
== Why is this an issue? | ||
|
||
In PyTorch the training loop of a neural network is comprised of a several steps: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
of several steps
rules/S6986/python/rule.adoc
Outdated
|
||
In PyTorch the training loop of a neural network is comprised of a several steps: | ||
* Forward pass, to pass the data through the model and output predictions | ||
* Loss computation, to compute the loss based and the predictions and the actual data |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
based on the the predictions and the ground truth
rules/S6986/python/rule.adoc
Outdated
* Gradients zeroed out, to prevent the gradients to accumulate with the `optimizer.zero_grad()` method | ||
|
||
When training a model it is important to reset gradients for each training loop. Failing to do so will skew the | ||
results as the update of the model's parameters will be done with the accumulated gradients from the previous iterations. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just accumulating gradients is not a problem, as long as they are reset at least once in a while ( we can't really detect if the reset is correctly implemented, just if it happens sometimes)
Quality Gate passed for 'rspec-frontend'Issues Measures |
Quality Gate passed for 'rspec-tools'Issues Measures |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
You can preview this rule here (updated a few minutes after each push).
Review
A dedicated reviewer checked the rule description successfully for: