Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

In detection task,bkwd factor usually appear to nan #6

Open
chenzx921020 opened this issue Nov 14, 2021 · 1 comment
Open

In detection task,bkwd factor usually appear to nan #6

chenzx921020 opened this issue Nov 14, 2021 · 1 comment

Comments

@chenzx921020
Copy link

When I train yolox,in EWGS with use_hessian,the final delta can reach at nan:

def backward(ctx, g):
diff = ctx.saved_tensors[0]
delta = ctx._scaling_factor
scale = 1 + delta * torch.sign(g)*diff
return g * scale, None, None

where the delta is calculated by hessian factor

@junghyup-lee
Copy link
Contributor

Since we have tested our code on the image classification tasks only, we are not sure about the reason.
We guess the reasons are:

  1. Computing Hessian trace could be unstable for YOLO.
    We referred to PyHessian when implementing the Hessian-based update process. There could be other options, but we have not tried until this moment.
  2. Our implementation could be problematic when gradients are extremely small, since we divide the Hessian trace with a standard deviation of the gradients. In this case, the relaxation by adding a small value to the denominator might help.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants