Issue with computing Hessian vector products using gradients obtained via hooks in PyTorch #261

caihuaiguang · 2024-07-15T16:43:08Z

caihuaiguang
Jul 15, 2024

Hi everyone,
I'm trying to implement a method to compute Hessian vector products (HVPs) using PyTorch, specifically using gradients obtained through hooks in a custom GradCAM class. However, I'm encountering an issue where the gradients obtained via hooks seem not to propagate correctly when computing the HVPs.

Here's a simplified version of my code:

import torch
import torch.nn as nn
import torch.nn.functional as F

# Define a simple convolutional neural network model
class SimpleModel(nn.Module):
    def __init__(self):
        super(SimpleModel, self).__init__()
        self.conv1 = nn.Conv2d(3, 16, 3)  # Output size: (224 - 3 + 1) = 222
        self.conv2 = nn.Conv2d(16, 32, 3)  # Output size: (222 - 3 + 1) = 220
        self.pool = nn.AdaptiveAvgPool2d((1, 1))  # Average pooling to (1, 1)
        self.fc = nn.Linear(32, 10)  # Adjusted to 32 to match the output after pooling

    def forward(self, x):
        x = F.relu(self.conv1(x))  # 224x224 -> 222x222
        x = F.relu(self.conv2(x))  # 222x222 -> 220x220
        x = self.pool(x)  # 220x220 -> 1x1
        x = x.view(x.size(0), -1)  # Flatten tensor to (batch_size, 32)
        x = self.fc(x)  # Fully connected layer
        return x

# Define GradCAM class
class GradCAM(nn.Module):
    def __init__(self, model, target_layer):
        super(GradCAM, self).__init__()
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activation = None
        # Register forward hook
        self.target_layer.register_forward_hook(self.forward_hook)

    def forward_hook(self, module, input, output):
        self.activation = output
        output.register_hook(self.backward_hook)

    def backward_hook(self, grad):
        self.gradients = grad

    def forward(self, x):
        return self.model(x)

# Instantiate model and GradCAM
model = SimpleModel()
target_layer = model.conv2
gradcam = GradCAM(model, target_layer)

# Input tensor
input_tensor = torch.randn(1, 3, 224, 224, requires_grad=True)

# Forward pass
output = gradcam(input_tensor)

# Compute loss and perform backward pass
loss = output.sum()
gradcam.model.zero_grad()
loss.backward(retain_graph=True)

# Get gradients and activation
gradients = gradcam.gradients
activation = gradcam.activation

# Compute Hessian-Vector Product

# Ensure activation has requires_grad=True
# Ensure gradients have requires_grad=True
activation.requires_grad_(True)
gradients.requires_grad_(True)

# Compute Hessian-Vector Product
hvp = torch.autograd.grad(
    outputs=gradients,
    inputs=activation,
    grad_outputs=activation,
    retain_graph=True
)

print("Hessian-Vector Product:", hvp)

When attempting to compute the Hessian vector product using torch.autograd.grad, I encounter the following error:

Traceback (most recent call last):
  File "F:\code\torch-cam\torchcam\methods\try2.py", line 71, in <module>
    hvp = torch.autograd.grad(
  File "D:\program\anaconda3\envs\cfr\lib\site-packages\torch\autograd\__init__.py", line 412, in grad
    result = _engine_run_backward(
  File "D:\program\anaconda3\envs\cfr\lib\site-packages\torch\autograd\graph.py", line 744, in _engine_run_backward
    return Variable._execution_engine.run_backward(  # Calls into the C++ engine to run the backward pass
RuntimeError: One of the differentiated Tensors appears to not have been used in the graph. Set allow_unused=True if this is the desired behavior.

I've ensured that both activation and gradients have requires_grad=True, but the issue persists. How can I correctly compute the Hessian vector product using gradients obtained via hooks in PyTorch?

Any insights or suggestions would be greatly appreciated! Thanks in advance!@frgfm

Answered by caihuaiguang

Jul 19, 2024

Hi @frgfm,

I want to let you know that I have resolved the issue. The reason the self.hook_g.grad_fn was None was that I was inputting layer4 as the value of target parameter instead of specifying a particular layer. The following command worked fine:

 python scripts/cam_example.py --savefig "./resnet18_layer4_1_conv2" --arch resnet18  --target layer4.1.conv2 --rows 1

Additionally, I have observed a phenomenon in my experiments: when the target layer is set to conv2 (the layer before batch normalization), some CAM methods do not seem to perform well. Is this behavior normal?

As a contradiction, the output when using layer4.1.bn2 is:

Thank you for your assistance!

Best regards

View full answer

frgfm · 2024-07-17T22:26:08Z

frgfm
Jul 17, 2024
Maintainer

Hey there @caihuaiguang 👋

Not sure I fully understand your end goal here and I don't use torch.autograd.grad so without a reference paper + the exact operation you're trying to achieve, I can only guess :/

But also, I'm not sure I'm the best person to ask this: you aren't asking about this library but about a Pytorch mechanism. Unless I misunderstand and you're requesting to add support for an interpretability method?

1 reply

caihuaiguang Jul 19, 2024
Author

Hi @frgfm,

I want to let you know that I have resolved the issue. The reason the self.hook_g.grad_fn was None was that I was inputting layer4 as the value of target parameter instead of specifying a particular layer. The following command worked fine:

 python scripts/cam_example.py --savefig "./resnet18_layer4_1_conv2" --arch resnet18  --target layer4.1.conv2 --rows 1

Additionally, I have observed a phenomenon in my experiments: when the target layer is set to conv2 (the layer before batch normalization), some CAM methods do not seem to perform well. Is this behavior normal?

As a contradiction, the output when using layer4.1.bn2 is:

Thank you for your assistance!

Best regards

Answer selected by caihuaiguang

caihuaiguang · 2024-07-18T00:30:06Z

caihuaiguang
Jul 18, 2024
Author

Hi @frgfm,

Thank you for your prompt response! I appreciate your time.

To clarify, I am currently developing a new method for Grad-CAM. The corresponding paper is still in the works. The code I shared is a minimal reproducible example of an issue I encountered while modifying your Grad-CAM implementation.

Specifically, in Grad-CAM, the mean of gradient (denote gradient as $g$) of the loss with respect to the activation map (let's call it $x$) is used as the weight for that map. I aim to use the mean of $x^TH$ as the new weight, where $H$ is the Hessian matrix of the loss with respect to the target layer. However, I've been struggling to implement this for over a week, and any potential help is greatly appreciated!

Thank you again for your help!

1 reply

caihuaiguang Jul 19, 2024
Author

After some attempts, I found that the following code works successfully:

import torch
import torch.nn as nn
import torchvision.models as models

# Define GradCAM class
class GradCAM(nn.Module):
    def __init__(self, model, target_layer):
        super(GradCAM, self).__init__()
        self.model = model
        self.target_layer = target_layer
        self.gradients = None
        self.activation = None
        # Register forward hook
        self.target_layer.register_forward_hook(self.forward_hook)

    def forward_hook(self, module, input, output):
        self.activation = output
        output.register_hook(self.backward_hook)

    def backward_hook(self, grad):
        self.gradients = grad

    def forward(self, x):
        return self.model(x)

# Instantiate a pretrained model and GradCAM
model = models.resnet18(pretrained=True)
target_layer = model.layer4[1].conv2
gradcam = GradCAM(model, target_layer)

input_tensor = torch.ones(1, 3, 224, 224, requires_grad=True)
output = gradcam(input_tensor)
loss = output.sum()
gradcam.model.zero_grad()
loss.backward(retain_graph=True, create_graph=True)

# Compute Hessian-Vector Product
hvp = torch.autograd.grad(
    outputs=gradcam.gradients,
    inputs=gradcam.activation,
    grad_outputs=gradcam.activation,
    retain_graph=True
)
print("Hessian-Vector Product:", hvp)

However, my own method based on this library consistently encounters the same issue: the grad_fn of self.hook_g is always None, which might be caused by some code that disrupts the computation graph. Below is my code:

class ShapleyGradCAM(_GradCAM):

    def _hook_a(self, _: nn.Module, _input: Tuple[Tensor, ...], output: Tensor, idx: int = 0) -> None:
        """Activation hook."""
        self.hook_a[idx] = output
        output.register_hook(partial(self._store_grad, idx=idx))

    def _store_grad(self, grad: Tensor, idx: int = 0) -> None:
        self.hook_g[idx] = grad
        print(grad.grad_fn)  #------always None!!!

    def _get_weights(
        self,
        class_idx: Union[int, List[int]],
        scores: Tensor,
        eps: float = 1e-8,
        **kwargs: Any,
    ) -> List[Tensor]:
        """Computes the weight coefficients of the hooked activation maps."""
        if isinstance(class_idx, int):
            loss = scores[:, class_idx].sum()
        else:
            loss = scores.gather(1, torch.tensor(class_idx, device=scores.device).view(-1, 1)).sum()
        self.model.zero_grad()
        loss.backward(retain_graph=True, create_graph = True)

        hvp = torch.autograd.grad(
            outputs=self.hook_g,
            inputs=self.hook_a,
            grad_outputs=self.hook_a,
            retain_graph=True
        )
        li = [
            (grad * act).flatten(2).mean(-1)
            for act, grad in zip(self.hook_a, hvp)
        ]
        return li

The reason my method is not working might be due to some operations in the library that destroy the computation graph, but I am not sure which ones.
Do you have any suggestions on what might be causing this issue or where I should check in my code? Thank you! @frgfm

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Issue with computing Hessian vector products using gradients obtained via hooks in PyTorch #261

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Replies: 2 comments 2 replies

{{title}}

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

{{title}}

{{editor}}'s edit

{{editor}}'s edit

Select a reply

Issue with computing Hessian vector products using gradients obtained via hooks in PyTorch #261

caihuaiguang Jul 15, 2024

Replies: 2 comments · 2 replies

frgfm Jul 17, 2024 Maintainer

caihuaiguang Jul 19, 2024 Author

caihuaiguang Jul 18, 2024 Author

caihuaiguang Jul 19, 2024 Author

caihuaiguang
Jul 15, 2024

Replies: 2 comments 2 replies

frgfm
Jul 17, 2024
Maintainer

caihuaiguang Jul 19, 2024
Author

caihuaiguang
Jul 18, 2024
Author

caihuaiguang Jul 19, 2024
Author