-
Notifications
You must be signed in to change notification settings - Fork 3.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[Relay, TOPI] Add negative log likelihood loss (nll_loss) op #8056
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for submitting this op. I know some people are looking forward to having it available. I've left some comments for things to fix (mostly around documentation).
@tkonolige |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the PR, excited to see others working on training support! I left some requests. A few more things:
- please add unit tests for the op
- are you planning on registering a gradient for this operator?
targets : tvm.relay.Expr | ||
The target value of each prediction. | ||
|
||
weights : tvm.relay.Expr |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can we make weights optional, like PyTorch? weights=1 is a pretty common case I believe and we could add a fast path implementation that skips the scaling
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@altanh We can make weights an optional parameter. I wonder if there are any example of a relay op with an optional tensor parameter that I can learn from. And also, how should we deal with gradient of an optional parameter? BTW, is there any better way we can mark a parameter as "no need for gradient" instead of returning an one_like
grad?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Hmm not sure, that's a good point- let's just keep the weights for now. As for not needing a gradient, currently there is no other way than just putting some dummy value. It might make sense for us to introduce a stop_gradient
dummy op which cuts the gradient computation from going further at undifferentiable arguments (this can be a future PR). Thanks!
src/relay/op/nn/nn.cc
Outdated
<< "weights shape = " << weights->shape; | ||
ICHECK(predictions->dtype == weights->dtype && predictions->dtype.is_float()) | ||
<< "NLLLossRel: predictions and weights should be of the same floating type."; | ||
ICHECK(targets->dtype.is_int()) << "NLLLossRel: targets should be of int type."; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you replace these ICHECKs with Diagnostics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is there a plan on changing all ICHECKS
to diagnostics? Or any rules on when to use ICHECK
and when to use diagnostics?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
basically, if the error can happen due to user input (e.g. using wrong shapes), we should definitely use diagnostics. ICHECK should be reserved only for internal compiler checks that should basically never fail unless there's a bug somewhere. The diagnostic framework is fairly new so a lot of old code still uses ICHECK incorrectly, we just need to slowly go through and update them unfortunately
*/ | ||
inline Tensor nll_loss(const Tensor& predictions, const Tensor& targets, const Tensor& weights, | ||
std::string reduction = "mean", int ignore_index = -100, | ||
const std::string name = "nll_loss", const std::string tag = kBroadcast) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
should the tag be kOpaque
to match the Relay pattern?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@altanh I am confused with the tag
in topi (ones in topi/tags.h
) and the OpPatternKind
in relay/op_attr_types.h
. It seems that they are not matched. Could you tell me the usage of them in tvm? Thank you~
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I see, that is confusing.. I'll get back to you on this soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@altanh Could you take a look at the tests? Thank you~ And I wonder if there is any update with the
tag
andOpPatternKind
?
I don't have much of an update for the tag, maybe you could try leaving it empty string?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tag here is topi-level, sometimes we use it to identify a specific compute
operation during schedule, otherwise we can leave it empty
@tkonolige Sorry about the incomplete formula... I don't know what I was thinking about yesterday 😱
I found that the ops are categorized into different levels. Could you tell me how to decide to level of an op? And I could add the test to the correct place.
@altanh Yes, I'm trying to write one for this op. I will try to implement one like the cpu version of pytorch or onnxruntime). |
Yep, sorry that the documentation of support level is lacking! We do have a doc at (https://tvm.apache.org/docs/langref/relay_op.html#overview-of-operators) but it seems outdated and missing quite a bit, so this needs to be improved. For this op, I think we can go with level 10 which matches the existing Thanks! |
@altanh Thanks. I'll add the test soon. Could you also check the comments above relate to the optional weight, Diagnostics and the tag? |
replied |
@altanh @tkonolige Sorry for the late update. I've just added the test of |
@altanh Could you take a look at the tests? Thank you~ And I wonder if there is any update with the |
It looks like we have some CI problems currently |
@altanh @tkonolige @tqchen Is there anything I can help with this pr... It has been stuck for a while.... |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for being slow on this. Could you make just a couple more changes.
@tkonolige Thank you for the reviews! I've modified the parts based on them. Could you take another look? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One last change and then I think this will be good to go.
@tkonolige Could you point out the change that I need to make? Thank you~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry, I forgot to include the actual change I wanted.
@@ -577,6 +577,49 @@ def _verify(input_shape, diagonal_shape, dtype, k=0, align="RIGHT_LEFT"): | |||
_verify((2, 3, 4), (2, 4, 3), "int32", (-1, 2), "RIGHT_RIGHT") | |||
|
|||
|
|||
@tvm.testing.uses_gpu | |||
def test_nll_loss(): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Switch this to use parameterize_targets
.
@tkonolige I've updated the test. But there seems to be some error with the cuda target. Could you give me some help? Thank you! Part of the error message in the CI is listed below:
|
@tkonolige Could you take another look? Thank you~ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
topi test on coda didn't pass. Looks like there are some issues with scheduling, when you called register_reduction_schedule
,on cuda target this schedule is used https://github.com/apache/tvm/blob/main/python/tvm/topi/cuda/reduction.py#L26 could you check if there are anything wrong?
*/ | ||
inline Tensor nll_loss(const Tensor& predictions, const Tensor& targets, const Tensor& weights, | ||
std::string reduction = "mean", int ignore_index = -100, | ||
const std::string name = "nll_loss", const std::string tag = kBroadcast) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
tag here is topi-level, sometimes we use it to identify a specific compute
operation during schedule, otherwise we can leave it empty
I think this error is due to using the default schedule (from |
nll_loss_result = topi.nn.nll_loss(predictions, targets, weights, reduction, ignore_index) | ||
|
||
with tvm.target.Target(target): | ||
s = tvm.te.create_schedule(nll_loss_result.op) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
default schedule is used here which caused ci errors
s = tvm.te.create_schedule(nll_loss_result.op) | |
fschedule = tvm.topi.testing.get_reduce_schedule(target) | |
s = fschedule([nll_loss_result]) |
@vinx13 @altanh Thank you for your help!
If I change the value of tvm/python/tvm/topi/x86/reduction.py Line 84 in d0791d3
I'm not sure if I need to adjust somewhere else... |
in this case we need to change tag to |
@vinx13 Changing tag to tvm/python/tvm/topi/x86/reduction.py Line 94 in d0791d3
whereas the empty tag was triggering traverse_before_reduce ....
It's worth noticing that the tag is of: auto T = tvm::te::compute(
targets->shape,
...
name, tag); which is an element-wise operation on Therefore, as |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM
This is merged, thanks @zhuzilin @altanh @tkonolige @tqchen |
@vinx13 @altanh @tkonolige @tqchen Thank you for your help! I'll submit a pr for the gradient of nll_loss soon. |
…8056) * add nll_loss * enrich the doc and rename parameters * update upon review * add tests * update based on reviews * update upon reviews * update upon reviews
…8056) * add nll_loss * enrich the doc and rename parameters * update upon review * add tests * update based on reviews * update upon reviews * update upon reviews
This PR adds the
nll_loss
op to relay and topi, so that we could translateaten::nll_loss
in pytorch frontend. Thenll_loss
is the underlying function for cross entropy in pytorch and is very important in the way of supporting training in tvm.Thank you for your time on reviewing this PR :).