You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Please see the figure for the computational graph of hinge loss @mmuneeburahman.
Performing the backprop based on the computational graph, we will get the desire result as what the code is doing. The subtraction term comes from the part I have circled in red.
Code-wise, since $W$ (w) is used to calculate both $\hat{Y}$ (Y_hat) and $\mathbf{\hat{y}}$ (y_hat_true), they both contribute to the derivative of $\frac{dL}{dW}$ (dW) as you can see from this line:
margins=np.maximum(0, Y_hat-y_hat_true+1)
By computing (margins > 0).sum(axis=1), we compute how many times W was used to calculate y_hat_true, i.e., how many times it contributed to the loss through y_hat_true. We negate it because y_hat_true is negative when computing margins.
https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110
Can someone explain why this subtraction is done? An explanation for derivative calculation.
The text was updated successfully, but these errors were encountered: