Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bugs in Metrics #6731

Closed
Roffild opened this issue Feb 24, 2021 · 9 comments
Closed

Bugs in Metrics #6731

Roffild opened this issue Feb 24, 2021 · 9 comments

Comments

@Roffild
Copy link
Contributor

Roffild commented Feb 24, 2021

gamma-nloglik:

bst_float psi = 1.0;
bst_float c = 1. / psi * std::log(y/psi) - std::log(y) - common::LogGamma(1. / psi);

c == 0

logloss:

  XGBOOST_DEVICE bst_float EvalRow(bst_float y, bst_float py) const {
    const bst_float eps = 1e-16f;
    const bst_float pneg = 1.0f - py;
    if (py < eps) {
      return -y * std::log(eps) - (1.0f - y)  * std::log(1.0f - eps);
    } else if (pneg < eps) {
      return -y * std::log(1.0f - eps) - (1.0f - y)  * std::log(eps);
    } else {
      return -y * std::log(py) - (1.0f - y) * std::log(pneg);
    }
  }

std::log(1.0f - eps) == std::log(1.0) == 0

gamma-deviance needs to be removed because the formula is not correct!(#6728)

I don't understand what math formula was used in poisson-nloglik.
-log( Poisson_regression )

With tweedie-nloglik it is also unclear. And the test is missing.

Tests for regression metrics with weights.(#6729)

If metrics are used in forest creation.....

@Roffild Roffild mentioned this issue Feb 24, 2021
23 tasks
@Roffild
Copy link
Contributor Author

Roffild commented Feb 24, 2021

I wanted to use metrics from XGBoost for Pytorch.

But now only sklearn.metrics for all models!

@trivialfis
Copy link
Member

I need to take a closer look.

@trivialfis
Copy link
Member

The bug in gamma deviance is fixed. Better documentation for other metrics will be a different topic. Thanks for raising the issue!

@trivialfis
Copy link
Member

trivialfis commented Mar 20, 2021

Just a quick note for everyone who has been following this thread. I believe these metrics and objectives are derived from the generalized linear model.

@Roffild
Copy link
Contributor Author

Roffild commented Mar 20, 2021

All metrics are calculated for each result separately.

Loss is calculated for each individual result, but metrics must be calculated for the entire result matrix. Therefore, the metrics in XGBoost are approximate.

@Roffild
Copy link
Contributor Author

Roffild commented Mar 20, 2021

pytorch/pytorch#22439

@trivialfis
Copy link
Member

I don't understand what math formula was used in poisson-nloglik.

See Evaluating the Poisson distribution section of https://en.wikipedia.org/wiki/Poisson_distribution

@trivialfis
Copy link
Member

trivialfis commented Mar 22, 2021

gamma-nloglik:

Yeah, this one is a bit confusing, I tracked down the PR for -log(\gamma): #1369 , which hard coded the dispersion to 1. Not entirely sure why.

Original PR for adding gamma regression: #1258 .

@trivialfis
Copy link
Member

The weird logloss you see is just a way to work around numerical issues.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants