You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Since we have tested our code on the image classification tasks only, we are not sure about the reason.
We guess the reasons are:
Computing Hessian trace could be unstable for YOLO.
We referred to PyHessian when implementing the Hessian-based update process. There could be other options, but we have not tried until this moment.
Our implementation could be problematic when gradients are extremely small, since we divide the Hessian trace with a standard deviation of the gradients. In this case, the relaxation by adding a small value to the denominator might help.
When I train yolox,in EWGS with use_hessian,the final delta can reach at nan:
def backward(ctx, g):
diff = ctx.saved_tensors[0]
delta = ctx._scaling_factor
scale = 1 + delta * torch.sign(g)*diff
return g * scale, None, None
where the delta is calculated by hessian factor
The text was updated successfully, but these errors were encountered: