RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward. #228

karnikkanojia · 2023-12-31T15:07:06Z

karnikkanojia
Dec 31, 2023

So I have gone through solutions already available on the forum. I’m using torchxrayvision and torchcam library. I needed Densenet121 pretrained weigths and torchcam to generate GradCAM for the model. Please help me in this the solutions available for it on pytorch forum doesn't look like it can help me in it.

Minimal reproducible example I can give you is this

preds = model(rescaled_output.unsqueeze(0))
cam_extractors = [
    cam(class_idx=i, scores=preds) for i, _ in enumerate(model.pathologies)
]

rescaled_output just represents an image in Tensor of shape (1, 244, 244).
I’m anyways encountering the as following:

{
	"name": "RuntimeError",
	"message": "Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward.",
	"stack": "---------------------------------------------------------------------------
RuntimeError                              Traceback (most recent call last)
Input In [24], in <cell line: 2>()
      1 preds = model(rescaled_output.unsqueeze(0))
----> 2 cam_extractors = [
      3     cam(class_idx=i, scores=preds) for i, _ in enumerate(model.pathologies)
      4 ]

Input In [24], in <listcomp>(.0)
      1 preds = model(rescaled_output.unsqueeze(0))
      2 cam_extractors = [
----> 3     cam(class_idx=i, scores=preds) for i, _ in enumerate(model.pathologies)
      4 ]

File /opt/homebrew/Caskroom/miniconda/base/envs/xray/lib/python3.10/site-packages/torchcam/methods/core.py:169, in _CAM.__call__(self, class_idx, scores, normalized, **kwargs)
    166 self._precheck(class_idx, scores)
    168 # Compute CAM
--> 169 return self.compute_cams(class_idx, scores, normalized, **kwargs)

File /opt/homebrew/Caskroom/miniconda/base/envs/xray/lib/python3.10/site-packages/torchcam/methods/core.py:193, in _CAM.compute_cams(self, class_idx, scores, normalized, **kwargs)
    178 \"\"\"Compute the CAM for a specific output class.
    179 
    180 Args:
   (...)
    190         the k-th element of the input batch for class index equal to the k-th element of `class_idx`.
    191 \"\"\"
    192 # Get map weight & unsqueeze it
--> 193 weights = self._get_weights(class_idx, scores, **kwargs)
    195 cams: List[Tensor] = []
    197 with torch.no_grad():

File /opt/homebrew/Caskroom/miniconda/base/envs/xray/lib/python3.10/site-packages/torchcam/methods/gradient.py:100, in GradCAM._get_weights(self, class_idx, scores, **kwargs)
     98 \"\"\"Computes the weight coefficients of the hooked activation maps.\"\"\"
     99 # Backpropagate
--> 100 self._backprop(scores, class_idx, **kwargs)
    102 self.hook_g: List[Tensor]  # type: ignore[assignment]
    103 # Global average pool the gradients over spatial dimensions

File /opt/homebrew/Caskroom/miniconda/base/envs/xray/lib/python3.10/site-packages/torchcam/methods/gradient.py:59, in _GradCAM._backprop(self, scores, class_idx, retain_graph)
     57     loss = scores.gather(1, torch.tensor(class_idx, device=scores.device).view(-1, 1)).sum()
     58 self.model.zero_grad()
---> 59 loss.backward(retain_graph=retain_graph)

File /opt/homebrew/Caskroom/miniconda/base/envs/xray/lib/python3.10/site-packages/torch/_tensor.py:492, in Tensor.backward(self, gradient, retain_graph, create_graph, inputs)
    482 if has_torch_function_unary(self):
    483     return handle_torch_function(
    484         Tensor.backward,
    485         (self,),
   (...)
    490         inputs=inputs,
    491     )
--> 492 torch.autograd.backward(
    493     self, gradient, retain_graph, create_graph, inputs=inputs
    494 )

File /opt/homebrew/Caskroom/miniconda/base/envs/xray/lib/python3.10/site-packages/torch/autograd/__init__.py:251, in backward(tensors, grad_tensors, retain_graph, create_graph, grad_variables, inputs)
    246     retain_graph = create_graph
    248 # The reason we repeat the same comment below is that
    249 # some Python versions print out the first line of a multi-line function
    250 # calls in the traceback and some print out the last line
--> 251 Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
    252     tensors,
    253     grad_tensors_,
    254     retain_graph,
    255     create_graph,
    256     inputs,
    257     allow_unreachable=True,
    258     accumulate_grad=True,
    259 )

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward."
}

Answered by frgfm

Jan 3, 2024

Thanks! So the error mentioned can be avoided as the library allows low-level PyTorch options:

import torch
import torchxrayvision as xrv
from torchcam.methods import GradCAM

image = xrv.utils.load_image(<path>)
image = torch.from_numpy(image)
model = xrv.models.DenseNet(weights="densenet121-res224-all").eval()
cam = GradCAM(model=model, target_layer=model.features[-2][-1][-1])
preds = model(image.unsqueeze(0))
cam_outputs = [cam(class_idx=idx, scores=preds, retain_graph=True) for idx in range(len(model.pathologies))]

This piece of code doesn't crash on my end 👍
It might be a bit slow as it will perform the backprop for each pathologies (18 apparently). One option that would use more RAM…

View full answer

frgfm · 2024-01-02T11:00:59Z

frgfm
Jan 2, 2024
Maintainer

Hi @karnikkanojia 👋

Thanks for reporting this! This looks like a simple problem of cell execution in a notebook. For me to help, I'd need a minimal reproducible snippet. The one you provided is not complete (missing imports, cam definition and model definition)

With my limited knowledge about the setup right now, I think what's happening is that your enumerate is going through the outputs of a single model. You're using a gradient-based CAM method, and so for each call in your list comprehension, it's doing backprop.

I'd suggest looping to nullify the grad + cam computation for each pathologies

preds = model(rescaled_output.unsqueeze(0))
cam_outputs = []
for idx in range(len(model.pathologies)):
    model.zero_grad()
    preds.zero_grad()
    cam_outputs.append(cam(class_idx=idx, scores=preds))

But to confirm this, please share a complete minimal reproducible snippet 🙏

Cheers!

4 replies

karnikkanojia Jan 2, 2024
Author

Thanks for replying @frgfm. Here is a minimal reproducible snippet for your reference.

import torch
import torchxrayvision as xrv
from torchcam.methods import GradCAM

image = xrv.utils.load_image(<path>)
image = torch.from_numpy(image)
model = xrv.models.DenseNet(weights="densenet121-res224-all")
model.eval()
cam = GradCAM(model=model, target_layer=model.features[-2][-1][-1])
preds = model(image.unsqueeze(0))
cam_outputs = []
for idx in range(len(model.pathologies)):
    model.zero_grad()
    preds.zero_grad()
    cam_outputs.append(cam(class_idx=idx, scores=preds))

So I tried your code it gives me following error: AttributeError: 'Tensor' object has no attribute 'zero_grad'. Anyways I removed preds.zero_grad() and I still get the same error.

frgfm Jan 3, 2024
Maintainer

Thanks! So the error mentioned can be avoided as the library allows low-level PyTorch options:

import torch
import torchxrayvision as xrv
from torchcam.methods import GradCAM

image = xrv.utils.load_image(<path>)
image = torch.from_numpy(image)
model = xrv.models.DenseNet(weights="densenet121-res224-all").eval()
cam = GradCAM(model=model, target_layer=model.features[-2][-1][-1])
preds = model(image.unsqueeze(0))
cam_outputs = [cam(class_idx=idx, scores=preds, retain_graph=True) for idx in range(len(model.pathologies))]

This piece of code doesn't crash on my end 👍
It might be a bit slow as it will perform the backprop for each pathologies (18 apparently). One option that would use more RAM but run faster is to: send the tensor duplicated 18 times on the batch axis, and send list(range(18)) as class_idx

Answer selected by karnikkanojia

karnikkanojia Jan 4, 2024
Author

Thanks it works nicely. Can you explain your suggestion, I couldn't comprehend much from it. @frgfm

frgfm Jan 4, 2024
Maintainer

Glad it worked!
Sure, the error you first mentioned:

RuntimeError: Trying to backward through the graph a second time (or directly access saved tensors after they have already been freed). Saved intermediate values of the graph are freed when you call .backward() or autograd.grad(). Specify retain_graph=True if you need to backward through the graph a second time or if you need to access saved tensors after calling backward."

is basically saying that backpropagation of gradient has been tried at least twice. An inner pytorch mechanism prevents that by default. I already had encountered that problem earlier and lodged a low-level mechanism of PyTorch backprop called retain_graph (as suggested in the error.
This keeps the gradient graph after it's been computed once. And so passing down retain_graph=True goes all the way down to the low-level backprop in the library (cf.

torch-cam/torchcam/methods/gradient.py

Line 64 in b59be23

loss.backward(retain_graph=retain_graph)

)

Let me know if things are still unclear!
Side suggestion: both for speed and quality of results, I'd suggest using LayerCAM instead of GradCAM. It's more recent, much faster on CPU, and produce higher quality outputs on my benchmarks!

Masateemah123 · 2024-11-04T23:06:53Z

Masateemah123
Nov 4, 2024

hey, i am experiencing the same RunTimeError. In my case i did not use the retain_graph and i got the error, however when i use retain_graph = True and retain_graph =retain_graph i still get the same error. (i am using snnTorch). here is the code snippet of where the error is;

Hyperparameters

input_size = inputs.shape[2]
hidden_size = 50
num_epochs = 100
learning_rate = 0.001

inputs_tensor = torch.tensor(inputs, dtype=torch.float32)
labels_tensor = torch.tensor(joint_labels, dtype=torch.float32)

model = ContactEstimationSNN(input_size, hidden_size)
criterion = nn.BCEWithLogitsLoss()
optimizer = optim.Adam(model.parameters(), lr=learning_rate)

# Training loop
for epoch in range(num_epochs):
    model.train()
    optimizer.zero_grad()
    
    # Forward pass
    outputs = model(inputs_tensor)
    loss = criterion(outputs[:, -1, 0], labels_tensor)  # Use last time step for prediction
    loss.backward(retain_graph=True)  # Do not retain the graph unless needed
    optimizer.step()
    
    if (epoch + 1) % 10 == 0:
        print(f'Epoch [{epoch + 1}/{num_epochs}], Loss: {loss.item():.4f}')

# Evaluation
# test_inputs_tensor = torch.tensor(test_inputs, dtype=torch.float32)
# test_labels_tensor = torch.tensor(test_labels, dtype=torch.float32)

model.eval()
with torch.no_grad():
    test_outputs = model(test_inputs_tensor)
    test_predictions = torch.sigmoid(test_outputs[:, -1, 0])  # Get the predicted probabilities
    predicted_labels = (test_predictions > 0.5).float()  # Binarize predictions

accuracy = (predicted_labels == labels_tensor).float().mean()
print(f'Test Accuracy: {accuracy.item():.4f}')

1 reply

frgfm Nov 6, 2024
Maintainer

Hey there 👋
I have two things in mind:

this is neither a script I can copy & run (missing some imports and definition) or a minimal one. could you try to narrow it down to something to run and reproduce please? Also, for your environment, you can collect system info using https://github.com/frgfm/torch-cam/blob/main/.github/collect_env.py
either there are things missing from the snippet, or it's not using this library. If the latter, you'd get better answer from the corresponding library repo (PyTorch presumably)

Take care!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

{{title}}

Replies: 2 comments 5 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{title}}

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

Select a reply

karnikkanojia Dec 31, 2023

Replies: 2 comments · 5 replies

frgfm Jan 2, 2024 Maintainer

karnikkanojia Jan 2, 2024 Author

frgfm Jan 3, 2024 Maintainer

karnikkanojia Jan 4, 2024 Author

frgfm Jan 4, 2024 Maintainer

Masateemah123 Nov 4, 2024

Hyperparameters

frgfm Nov 6, 2024 Maintainer

karnikkanojia
Dec 31, 2023

Replies: 2 comments 5 replies

frgfm
Jan 2, 2024
Maintainer

karnikkanojia Jan 2, 2024
Author

frgfm Jan 3, 2024
Maintainer

karnikkanojia Jan 4, 2024
Author

frgfm Jan 4, 2024
Maintainer

Masateemah123
Nov 4, 2024

frgfm Nov 6, 2024
Maintainer