Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977

github-actions · 2024-06-05T15:09:50Z

You can preview this rule here (updated a few minutes after each push).

Review

A dedicated reviewer checked the rule description successfully for:

logical errors and incorrect information
information gaps and missing content
text style and tone
PR summary and labels follow the guidelines

…ion with "optimizer.step()" and "loss.backward()"

ghislainpiot

The rule looks good, I have a few comments about the classification of the rule and some assumptions on what users might do in the training loop.

ghislainpiot · 2024-06-07T07:50:10Z

rules/S6986/python/metadata.json

@@ -0,0 +1,25 @@
+{
+  "title": "\"optimizer.zero_grad()\" should be used in conjunction with \"optimizer.step()\" and \"loss.backward()\"",
+  "type": "CODE_SMELL",


I would probably go with BUG. If the gradients are never zeroed, they will eventually explode or just get too big and wreck the weights

ghislainpiot · 2024-06-07T08:40:29Z

rules/S6986/python/metadata.json

+    "impacts": {
+      "RELIABILITY": "HIGH"
+    },
+    "attribute": "CONVENTIONAL"


I think it is more complete or logical

ghislainpiot · 2024-06-07T08:48:08Z

rules/S6986/python/rule.adoc

+This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`.
+== Why is this an issue?
+
+In PyTorch the training loop of a neural network is comprised of a several steps: 


We should probably indicate that the steps are not necessarily in order. You can zero_grad before the whole loop. Or you can do many zero_grad in one epoch (minibatches). Or you can do it every few minibatches (gradient accumulation, allows you to have bigger batch size without taking up more (V)RAM)

ghislainpiot · 2024-06-07T08:48:16Z

rules/S6986/python/rule.adoc

+
+== How to fix it
+
+To fix the issue call the `optimizer.zero_grad()` method.


Add a comma

ghislainpiot · 2024-06-07T08:49:19Z

rules/S6986/python/rule.adoc

+import torch
+from my_data import data
+
+loss_fn = torch.nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+
+for epoch in range(100): 
+  for i in range(len(data)): 
+      output = model(data[i])
+      loss = loss_fn(output, labels[i])
+      loss.backward()
+      optimizer.step() # Noncompliant: optimizer.zero_grad() was not called in the training loop
+----
+
+==== Compliant solution
+
+[source,python,diff-id=1,diff-type=compliant]
+----
+import torch
+from my_data import data, labels
+
+loss_fn = torch.nn.CrossEntropyLoss()
+optimizer = torch.optim.SGD(model.parameters(), lr=0.01)
+
+for epoch in range(100): 
+  for i in range(len(data)): 
+      optimizer.zero_grad()
+      output = model(data[i])
+      loss = loss_fn(output, labels[i])
+      loss.backward()
+      optimizer.step() # Compliant


The exemples should probably be using Dataset/Dataloaders, to highlight the right way to do things

ghislainpiot · 2024-06-07T08:50:10Z

rules/S6986/python/rule.adoc

+This rule raises an issue when PyTorch `optimizer.step()` and `loss.backward()` is used without `optimizer.zero_grad()`.
+== Why is this an issue?
+
+In PyTorch the training loop of a neural network is comprised of a several steps: 


of several steps

ghislainpiot · 2024-06-07T08:50:59Z

rules/S6986/python/rule.adoc

+
+In PyTorch the training loop of a neural network is comprised of a several steps: 
+* Forward pass, to pass the data through the model and output predictions
+* Loss computation, to compute the loss based and the predictions and the actual data


based on the the predictions and the ground truth

ghislainpiot · 2024-06-07T08:52:05Z

rules/S6986/python/rule.adoc

+* Gradients zeroed out, to prevent the gradients to accumulate with the `optimizer.zero_grad()` method
+
+When training a model it is important to reset gradients for each training loop. Failing to do so will skew the 
+results as the update of the model's parameters will be done with the accumulated gradients from the previous iterations.


Just accumulating gradients is not a problem, as long as they are reset at least once in a while ( we can't really detect if the reset is correctly implemented, just if it happens sometimes)

sonarqube-next · 2024-06-10T08:23:03Z

Quality Gate passed for 'rspec-frontend'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

sonarqube-next · 2024-06-10T08:23:42Z

Quality Gate passed for 'rspec-tools'

Issues
0 New issues
0 Fixed issues
0 Accepted issues

Measures
0 Security Hotspots
No data about Coverage
No data about Duplication

See analysis details on SonarQube

ghislainpiot

LGTM!

Create rule S6986

7549e7c

github-actions bot assigned ghislainpiot Jun 5, 2024

github-actions bot added the python label Jun 5, 2024

joke1196 assigned joke1196 and unassigned ghislainpiot Jun 6, 2024

joke1196 force-pushed the rule/add-RSPEC-S6986 branch from 9e9e5e2 to 8ab466d Compare June 6, 2024 13:02

Create rule S6986: "optimizer.zero_grad()" should be used in conjunct…

1eb2c28

…ion with "optimizer.step()" and "loss.backward()"

joke1196 force-pushed the rule/add-RSPEC-S6986 branch from 8ab466d to 1eb2c28 Compare June 6, 2024 13:57

ghislainpiot requested changes Jun 7, 2024

View reviewed changes

Fix after review

00e9d68

joke1196 changed the title ~~Create rule S6986~~ Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" Jun 10, 2024

joke1196 requested a review from ghislainpiot June 10, 2024 08:27

ghislainpiot approved these changes Jun 10, 2024

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977

Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977

github-actions bot commented Jun 5, 2024

ghislainpiot left a comment

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

ghislainpiot Jun 7, 2024

sonarqube-next bot commented Jun 10, 2024

sonarqube-next bot commented Jun 10, 2024

ghislainpiot left a comment


		== How to fix it

		To fix the issue call the `optimizer.zero_grad()` method.

Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977

Are you sure you want to change the base?

Create rule S6986: "optimizer.zero_grad()" should be used in conjunction with "optimizer.step()" and "loss.backward()" #3977

Conversation

github-actions bot commented Jun 5, 2024

Review

ghislainpiot left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

sonarqube-next bot commented Jun 10, 2024

Quality Gate passed for 'rspec-frontend'

sonarqube-next bot commented Jun 10, 2024

Quality Gate passed for 'rspec-tools'

ghislainpiot left a comment

Choose a reason for hiding this comment