Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Sounds like a lucky result comes from a wrong formula deduction #9

Open
edfall opened this issue Feb 26, 2020 · 2 comments
Open

Sounds like a lucky result comes from a wrong formula deduction #9

edfall opened this issue Feb 26, 2020 · 2 comments

Comments

@edfall
Copy link

edfall commented Feb 26, 2020

I read the paper carefully, the formula in paper is fundamentally wrong.

  • Under the formula (2) and (3), the probility output has a gaussian distribution. However, the probility can't be a gaussian distribution as it distributed in [0,1] rather than (-infty, +infty).

  • Under the independent assumption(formula (4)) and gaussian distribution mentioned above, the formula (7) is correct. However, if we just look at the first line in formula (7), if independent assumption is established, -log p(y1, y2|f(w,x)) = -log p(y1|f(w,x)) - log(y2|f(w,x)); which is just a sum of cross-entropy loss over different tasks. This is apparently contradicted with the result under additional gaussian assumption.

  • Somehow, the paper repalce the cross entrophy loss with mse which finally reach the result that higher loss task should have higher theta weights. If the paper report is correct, I think the benefit here comes from loss re-balance. Which means, re-balance the task loss will benefit multi-task performance?

@yaringal
Copy link
Owner

"Under the formula (2) and (3), the probility output has a gaussian distribution."

  • no, the random variable y follows a Gaussian distribution under (2) only. It follows a categorical distribution under (3)

"However, the probility can't be a gaussian distribution as it distributed in [0,1] rather than (-infty, +infty)."

  • no, the outcomes a random variable can take are in (-infty, +infty) for a Gaussian, and "classes" (categories) for a categorical rv. These are different to the probability of the rv to take these values (density for the continuous case). Also note that the density can't be negative (it is a measure) and that it can be larger than 1 for continuous random variables.

"if we just look at the first line in formula (7), if independent assumption is established, -log p(y1, y2|f(w,x)) = -log p(y1|f(w,x)) - log(y2|f(w,x)); which is just a sum of cross-entropy loss over different tasks."

  • no, for Gaussian likelihoods the log Gaussian is a scaled MSE loss (and not cross-entropy).

"This is apparently contradicted with the result under additional gaussian assumption."

  • I don't understand this sentence

"Somehow, the paper repalce the cross entrophy loss with mse"

  • no, the log of a Gaussian density is the Euclidean distance. Have a look here
    https://en.wikipedia.org/wiki/Gaussian_function
    and here
    John Denker and Yann LeCun. "Transforming neural-net output levels to probability distributions". In Advances in Neural Information Processing Systems 3. 1991

"which finally reach the result that higher loss task should have higher theta weights."

  • what's theta?

@jiazhiguan
Copy link

fabulous!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants