Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977

Draft
wants to merge 3 commits into
base: master
Choose a base branch
from

Conversation

github-actions[bot]
Copy link
Contributor

@github-actions github-actions bot commented Jun 5, 2024

You can preview this rule here (updated a few minutes after each push).

Review

A dedicated reviewer checked the rule description successfully for:

  • logical errors and incorrect information
  • information gaps and missing content
  • text style and tone
  • PR summary and labels follow the guidelines

…ion with "optimizer.step()" and "loss.backward()"
@joke1196 joke1196 force-pushed the rule/add-RSPEC-S6986 branch from 8ab466d to 1eb2c28 Compare June 6, 2024 13:57
Copy link
Contributor

@ghislainpiot ghislainpiot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The rule looks good, I have a few comments about the classification of the rule and some assumptions on what users might do in the training loop.

@@ -0,0 +1,25 @@
{
"title": "\"optimizer.zero_grad()\" should be used in conjunction with \"optimizer.step()\" and \"loss.backward()\"",
"type": "CODE_SMELL",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would probably go with BUG. If the gradients are never zeroed, they will eventually explode or just get too big and wreck the weights

"impacts": {
"RELIABILITY": "HIGH"
},
"attribute": "CONVENTIONAL"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is more complete or logical

This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`.
== Why is this an issue?

In PyTorch the training loop of a neural network is comprised of a several steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should probably indicate that the steps are not necessarily in order. You can zero_grad before the whole loop. Or you can do many zero_grad in one epoch (minibatches). Or you can do it every few minibatches (gradient accumulation, allows you to have bigger batch size without taking up more (V)RAM)


== How to fix it

To fix the issue call the `optimizer.zero_grad()` method.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add a comma

Comment on lines 25 to 55
import torch
from my_data import data

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

for epoch in range(100):
for i in range(len(data)):
output = model(data[i])
loss = loss_fn(output, labels[i])
loss.backward()
optimizer.step() # Noncompliant: optimizer.zero_grad() was not called in the training loop
----

==== Compliant solution

[source,python,diff-id=1,diff-type=compliant]
----
import torch
from my_data import data, labels

loss_fn = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.01)

for epoch in range(100):
for i in range(len(data)):
optimizer.zero_grad()
output = model(data[i])
loss = loss_fn(output, labels[i])
loss.backward()
optimizer.step() # Compliant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exemples should probably be using Dataset/Dataloaders, to highlight the right way to do things

This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`.
== Why is this an issue?

In PyTorch the training loop of a neural network is comprised of a several steps:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

of several steps


In PyTorch the training loop of a neural network is comprised of a several steps:
* Forward pass, to pass the data through the model and output predictions
* Loss computation, to compute the loss based and the predictions and the actual data
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

based on the the predictions and the ground truth

* Gradients zeroed out, to prevent the gradients to accumulate with the `optimizer.zero_grad()` method

When training a model it is important to reset gradients for each training loop. Failing to do so will skew the
results as the update of the model's parameters will be done with the accumulated gradients from the previous iterations.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just accumulating gradients is not a problem, as long as they are reset at least once in a while ( we can't really detect if the reset is correctly implemented, just if it happens sometimes)

@joke1196 joke1196 changed the title Create rule S6986 Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" Jun 10, 2024
Copy link

Quality Gate passed Quality Gate passed for 'rspec-frontend'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

Copy link

Quality Gate passed Quality Gate passed for 'rspec-tools'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

@joke1196 joke1196 requested a review from ghislainpiot June 10, 2024 08:27
Copy link
Contributor

@ghislainpiot ghislainpiot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants