You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I am trying to implement your model RUGE on PyTorch. I have read your paper and checked your code and reproduce your results in JAVA. I have some questions after comparing your implementation and the paper:
1. In the implementation which loss did you use? It looks like you did not use cross_entropy loss. I am claiming this by checking how you calculate the gradients in StochasticGradient.java's calculateGradient method. It looks like you have used MSE or something similar to that. CORRECTION: I have checked it again and after taking the derivative correctly, I found out that you have used the gradient of the cross_entropy loss in the implementation like in the paper as well.
2. Also, theoretically is it OK to use cross_entropy loss for non-binary targets? Let's say the soft-label is supposed to be 0.8 and model predicted the unlabeled score as 0.8 too. When we put soft-label as a target and unlabeled_score as an output, then ideally loss should be 0 since the model correctly predicted the target. However, if we use cross_entropy to calculate the loss, loss will be 0.5004. So my worry is, using cross_entropy might mislead the model during unlabeled_loss calculation. It is doing good during the labeled_loss calculation since labeled_loss' targets are either 1s(true triples) or 0s(negative samples). I believe that is why you do not use cross_entropy in your implementation
cross entropy method that I used where x is output, y is target: out = torch.mean(((-y) * torch.log(x + 1e-10)) -(1-y)*torch.log((1-x) + 1e-10))
The text was updated successfully, but these errors were encountered:
Hi there!
I am trying to implement your model RUGE on PyTorch. I have read your paper and checked your code and reproduce your results in JAVA. I have some questions after comparing your implementation and the paper:
1. In the implementation which loss did you use? It looks like you did not use cross_entropy loss. I am claiming this by checking how you calculate the gradients in StochasticGradient.java's calculateGradient method. It looks like you have used MSE or something similar to that.
CORRECTION: I have checked it again and after taking the derivative correctly, I found out that you have used the gradient of the cross_entropy loss in the implementation like in the paper as well.
2. Also, theoretically is it OK to use cross_entropy loss for non-binary targets? Let's say the soft-label is supposed to be 0.8 and model predicted the unlabeled score as 0.8 too. When we put soft-label as a target and unlabeled_score as an output, then ideally loss should be 0 since the model correctly predicted the target. However, if we use cross_entropy to calculate the loss, loss will be 0.5004. So my worry is, using cross_entropy might mislead the model during unlabeled_loss calculation. It is doing good during the labeled_loss calculation since labeled_loss' targets are either 1s(true triples) or 0s(negative samples). I believe that is why you do not use cross_entropy in your implementation
cross entropy method that I used where x is output, y is target:
out = torch.mean(((-y) * torch.log(x + 1e-10)) -(1-y)*torch.log((1-x) + 1e-10))
The text was updated successfully, but these errors were encountered: