pool.scalar() returns wrong df (NaN) whenever n = Inf #441

huftis · 2021-11-05T12:04:40Z

According to the documentation, pool.scalar() will assume an infinite sample (n = Inf) by default. But that doesn’t match the actual behaviour, which results in a degrees of freedom of NaN. Example:

library(mice)
pool.scalar(13:17, 3:7)$df
#> [1] NaN

The expected result would be approx. the df one gets when one uses a very large n, e.g.:

pool.scalar(13:17, 3:7, n = 10^6)$df
#> [1] 28.44315

The bug is caused by the barnard.rubin() function (which pool.scalar() uses internally):

barnard.rubin <- function(m, b, t, dfcom = 999999) {
  lambda <- (1 + 1 / m) * b / t
  lambda[lambda < 1e-04] <- 1e-04
  dfold <- (m - 1) / lambda^2
  dfobs <- (dfcom + 1) / (dfcom + 3) * dfcom * (1 - lambda)
  dfold * dfobs / (dfold + dfobs)
}

When dfcom = Inf, (dfcom + 1) / (dfcom + 3) in the dfobs <- line equals Inf/Inf, which is NaN (not 1), and it is still NaN when multiplied by dfcom * (1 - lambda). It should instead be Inf.

Since the factor dfobs / (dfold + dfobs) in the last line is 1 whenver dfobs is Inf, the correct behaviour would be to just output dfold whenever dfcom is Inf (and perhaps the default value dfcom = 999999 should be changed to dfcom = Inf). For the above example, the resulting value is (exactly) 28.44444…, which is in line with what you get with the large value n = 10^6 (28.44315).

Summary:

Whenever dfcom = Inf, barnard.rubin() should output dfold instead of dfold * dfobs / (dfold + dfobs).
The default and arbitrary value of dfcom = 999999 should be changed to dfcom = Inf.

The text was updated successfully, but these errors were encountered:

stefvanbuuren · 2021-11-05T15:08:42Z

Thanks for alerting.

As statisticians we may be inclined to think that infinite starts at 1000. You showed that's not quite true. :-)

Now repaired.

stefvanbuuren added a commit that referenced this issue Nov 5, 2021

Resolve error in df calculation when n is infinite (#441)

4b2d0df

stefvanbuuren closed this as completed Nov 5, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pool.scalar() returns wrong df (NaN) whenever n = Inf #441

pool.scalar() returns wrong df (NaN) whenever n = Inf #441

huftis commented Nov 5, 2021

stefvanbuuren commented Nov 5, 2021

pool.scalar() returns wrong df (NaN) whenever n = Inf #441

pool.scalar() returns wrong df (NaN) whenever n = Inf #441

Comments

huftis commented Nov 5, 2021

stefvanbuuren commented Nov 5, 2021