You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Be careful to explicitly broadcast the bias term to the correct shape -- Needle does not support implicit broadcasting.
And as noted in this Forum discusion, needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.
As a result we need to explicitly broadcast bias tensor in nn.Linear.forward().
I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape (1, out_features) or in a 1D-tensor of shape (out_features, ).
However there is ambiguity in test_nn_and_optim.py.
test_nn_linear_bias_init_1() asserts that bias is initialized as a 2D-tensor of shape (1, out_features)
but linear_forward() and linear_backward() functions (that are used in test_nn_linear_forward_* and test_nn_linear_backward_* tests accordingly) assign 1D-tensor of shape (out_features, ) to a bias: f.bias.data = get_tensor(lhs_shape[-1])
Question: By the way, why do we call get_tensor to assign a new value to a bias term? Can't we use f.bias value that was created during initialization?
I think we should allow bias to be any of valid shapes ((1, out_features) or (out_features, )) in test_nn_linear_bias_init_1() test
The text was updated successfully, but these errors were encountered:
As noted in
hw2.ipynb
:And as noted in this Forum discusion, needle does not support implicit broadcasting because such broadcasts are not tracked in computational graph - this leads to wrong gradient computations in backward pass.
As a result we need to explicitly broadcast bias tensor in
nn.Linear.forward()
.I guess, there should be no restriction on what shape we use to store bias term after initialization - we will broadcast it anyway during forward pass. We can store bias term either in a 2D-tensor of shape
(1, out_features)
or in a 1D-tensor of shape(out_features, )
.However there is ambiguity in
test_nn_and_optim.py
.test_nn_linear_bias_init_1()
asserts that bias is initialized as a 2D-tensor of shape(1, out_features)
linear_forward()
andlinear_backward()
functions (that are used intest_nn_linear_forward_*
andtest_nn_linear_backward_*
tests accordingly) assign 1D-tensor of shape(out_features, )
to a bias:f.bias.data = get_tensor(lhs_shape[-1])
Question: By the way, why do we call
get_tensor
to assign a new value to a bias term? Can't we usef.bias
value that was created during initialization?I think we should allow bias to be any of valid shapes (
(1, out_features)
or(out_features, )
) intest_nn_linear_bias_init_1()
testThe text was updated successfully, but these errors were encountered: