Issue with computing Hessian vector products using gradients obtained via hooks in PyTorch #261
-
Hi everyone, Here's a simplified version of my code: import torch
import torch.nn as nn
import torch.nn.functional as F
# Define a simple convolutional neural network model
class SimpleModel(nn.Module):
def __init__(self):
super(SimpleModel, self).__init__()
self.conv1 = nn.Conv2d(3, 16, 3) # Output size: (224 - 3 + 1) = 222
self.conv2 = nn.Conv2d(16, 32, 3) # Output size: (222 - 3 + 1) = 220
self.pool = nn.AdaptiveAvgPool2d((1, 1)) # Average pooling to (1, 1)
self.fc = nn.Linear(32, 10) # Adjusted to 32 to match the output after pooling
def forward(self, x):
x = F.relu(self.conv1(x)) # 224x224 -> 222x222
x = F.relu(self.conv2(x)) # 222x222 -> 220x220
x = self.pool(x) # 220x220 -> 1x1
x = x.view(x.size(0), -1) # Flatten tensor to (batch_size, 32)
x = self.fc(x) # Fully connected layer
return x
# Define GradCAM class
class GradCAM(nn.Module):
def __init__(self, model, target_layer):
super(GradCAM, self).__init__()
self.model = model
self.target_layer = target_layer
self.gradients = None
self.activation = None
# Register forward hook
self.target_layer.register_forward_hook(self.forward_hook)
def forward_hook(self, module, input, output):
self.activation = output
output.register_hook(self.backward_hook)
def backward_hook(self, grad):
self.gradients = grad
def forward(self, x):
return self.model(x)
# Instantiate model and GradCAM
model = SimpleModel()
target_layer = model.conv2
gradcam = GradCAM(model, target_layer)
# Input tensor
input_tensor = torch.randn(1, 3, 224, 224, requires_grad=True)
# Forward pass
output = gradcam(input_tensor)
# Compute loss and perform backward pass
loss = output.sum()
gradcam.model.zero_grad()
loss.backward(retain_graph=True)
# Get gradients and activation
gradients = gradcam.gradients
activation = gradcam.activation
# Compute Hessian-Vector Product
# Ensure activation has requires_grad=True
# Ensure gradients have requires_grad=True
activation.requires_grad_(True)
gradients.requires_grad_(True)
# Compute Hessian-Vector Product
hvp = torch.autograd.grad(
outputs=gradients,
inputs=activation,
grad_outputs=activation,
retain_graph=True
)
print("Hessian-Vector Product:", hvp) When attempting to compute the Hessian vector product using
I've ensured that both activation and gradients have requires_grad=True, but the issue persists. How can I correctly compute the Hessian vector product using gradients obtained via hooks in PyTorch? Any insights or suggestions would be greatly appreciated! Thanks in advance!@frgfm |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 2 replies
-
Hey there @caihuaiguang 👋 Not sure I fully understand your end goal here and I don't use But also, I'm not sure I'm the best person to ask this: you aren't asking about this library but about a Pytorch mechanism. Unless I misunderstand and you're requesting to add support for an interpretability method? |
Beta Was this translation helpful? Give feedback.
-
Hi @frgfm, Thank you for your prompt response! I appreciate your time. To clarify, I am currently developing a new method for Grad-CAM. The corresponding paper is still in the works. The code I shared is a minimal reproducible example of an issue I encountered while modifying your Grad-CAM implementation. Specifically, in Grad-CAM, the mean of gradient (denote gradient as Thank you again for your help! |
Beta Was this translation helpful? Give feedback.
Hi @frgfm,
I want to let you know that I have resolved the issue. The reason the
self.hook_g.grad_fn
wasNone
was that I was inputtinglayer4
as the value oftarget
parameter instead of specifying a particular layer. The following command worked fine:Additionally, I have observed a phenomenon in my experiments: when the target layer is set to
conv2
(the layer before batch normalization), some CAM methods do not seem to perform well. Is this behavior normal?As a contradiction, the output when using layer4.1.bn2 is:
Thank you for your assistance!
Best regards