Skip to content
This repository has been archived by the owner on Oct 31, 2023. It is now read-only.

Non scalar loss #134

Open
janglinko-dac opened this issue Dec 14, 2022 · 1 comment
Open

Non scalar loss #134

janglinko-dac opened this issue Dec 14, 2022 · 1 comment

Comments

@janglinko-dac
Copy link

Hi!
I'm training a network with two separate head (something like Hydranet).
How should I deal with non-scalar losses?
With standard pytorch backward process I'm just feeding backward() with

The "vector" in the Jacobian-vector product, usually gradients w.r.t. each element of corresponding tensors.

loss_seq = [loss_head_1, loss_head_2]
grad_seq = [torch.tensor(1.0).cuda(device) for _ in range(len(loss_seq))]
torch.autograd.backward(loss_seq, grad_seq)

Is it possible to handle this scenario with higher? What should I pass to the diffopt.step()? Is it enough to invoke diffopt.step(loss_seq)?

Thanks for help in advance!

@HamedHemati
Copy link

If you have multiple loss terms, you can just add them together to obtain a single scalar and then call diffopt.step(.). It is equivalent to backpropagating through each loss term separately. Just note that the gradients for the shared modules in the model will be aggregated, which is the default PyTorch behavior.

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants