This repository has been archived by the owner on Nov 17, 2023. It is now read-only.
-
Notifications
You must be signed in to change notification settings - Fork 6.8k
[Vote] Softmax and Loss Convention #434
Labels
Comments
I vote for 2 |
I vote for 2 with the following name change: |
+1 for 2 |
+1 for 2 |
Sorry currently traveling. I think 1 is a more unified behavior but requires more changes to the internals (eg attach-loss etc). I'm generally ok with either design. |
@hjk41 can you make a windows build for the most recent version? |
pluskid
added a commit
to dmlc/MXNet.jl
that referenced
this issue
Nov 2, 2015
all sides updated |
Sign up for free
to subscribe to this conversation on GitHub.
Already have an account?
Sign in.
According to discussions in #426
We agreed in all case, the multi-output and single output softmax can be combined into one-class, maybe overloaded by shape of label.
SoftmaxOutput
for output without gradient attached.Softmax
remains the same, with loss already attached.SoftmaxOutput
for output, with specific backward behavior of cross-entropy-lossMulticlassProbOutput
to be clear to user.Softmax
behaves normally(only take input), and can prop gradient back from any output source(being able to compose as internal node)CrossEntropyLoss
can be composed with anything, includingSoftmax
, to get loss in forward and gradient in backward.Please edit this post to add more options .
The text was updated successfully, but these errors were encountered: