Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pool.scalar() returns wrong df (NaN) whenever n = Inf #441

Closed
huftis opened this issue Nov 5, 2021 · 1 comment
Closed

pool.scalar() returns wrong df (NaN) whenever n = Inf #441

huftis opened this issue Nov 5, 2021 · 1 comment

Comments

@huftis
Copy link

huftis commented Nov 5, 2021

According to the documentation, pool.scalar() will assume an infinite sample (n = Inf) by default. But that doesn’t match the actual behaviour, which results in a degrees of freedom of NaN. Example:

library(mice)
pool.scalar(13:17, 3:7)$df
#> [1] NaN

The expected result would be approx. the df one gets when one uses a very large n, e.g.:

pool.scalar(13:17, 3:7, n = 10^6)$df
#> [1] 28.44315

The bug is caused by the barnard.rubin() function (which pool.scalar() uses internally):

barnard.rubin <- function(m, b, t, dfcom = 999999) {
  lambda <- (1 + 1 / m) * b / t
  lambda[lambda < 1e-04] <- 1e-04
  dfold <- (m - 1) / lambda^2
  dfobs <- (dfcom + 1) / (dfcom + 3) * dfcom * (1 - lambda)
  dfold * dfobs / (dfold + dfobs)
}

When dfcom = Inf, (dfcom + 1) / (dfcom + 3) in the dfobs <- line equals Inf/Inf, which is NaN (not 1), and it is still NaN when multiplied by dfcom * (1 - lambda). It should instead be Inf.

Since the factor dfobs / (dfold + dfobs) in the last line is 1 whenver dfobs is Inf, the correct behaviour would be to just output dfold whenever dfcom is Inf (and perhaps the default value dfcom = 999999 should be changed to dfcom = Inf). For the above example, the resulting value is (exactly) 28.44444…, which is in line with what you get with the large value n = 10^6 (28.44315).

Summary:

  • Whenever dfcom = Inf, barnard.rubin() should output dfold instead of dfold * dfobs / (dfold + dfobs).
  • The default and arbitrary value of dfcom = 999999 should be changed to dfcom = Inf.
@stefvanbuuren
Copy link
Member

Thanks for alerting.

As statisticians we may be inclined to think that infinite starts at 1000. You showed that's not quite true. :-)

Now repaired.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants