svm_loss_vectorized Derivative #9

mmuneeburahman · 2023-12-01T10:22:56Z

https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110
Can someone explain why this subtraction is done? An explanation for derivative calculation.

nhattan417 · 2023-12-21T08:38:37Z

https://github.com/naya0000/cs231n/blob/e1192dc8cbaf078c3cfb691e12b8d6d2ec40c8fa/assignment1/cs231n/classifiers/linear_svm.py#L110 Can someone explain why this subtraction is done? An explanation for derivative calculation.

Please see the figure for the computational graph of hinge loss @mmuneeburahman.

Performing the backprop based on the computational graph, we will get the desire result as what the code is doing. The subtraction term comes from the part I have circled in red.

mantasu · 2024-06-05T10:02:30Z

Code-wise, since $W$ (w) is used to calculate both $\hat{Y}$ (Y_hat) and $\mathbf{\hat{y}}$ (y_hat_true), they both contribute to the derivative of $\frac{dL}{dW}$ (dW) as you can see from this line:

margins = np.maximum(0, Y_hat - y_hat_true + 1)

By computing (margins > 0).sum(axis=1), we compute how many times W was used to calculate y_hat_true, i.e., how many times it contributed to the loss through y_hat_true. We negate it because y_hat_true is negative when computing margins.

mantasu added the question Further information is requested label Jun 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

svm_loss_vectorized Derivative #9

svm_loss_vectorized Derivative #9

mmuneeburahman commented Dec 1, 2023

nhattan417 commented Dec 21, 2023

mantasu commented Jun 5, 2024

svm_loss_vectorized Derivative #9

svm_loss_vectorized Derivative #9

Comments

mmuneeburahman commented Dec 1, 2023

nhattan417 commented Dec 21, 2023

mantasu commented Jun 5, 2024