-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Preserve the stochastic nature of mice results (#426) #459
Comments
Thanks for noting. I wasn't aware of the repetitive nature of the imputes in this case. If I understand correctly, the problem occurs when no Perhaps a simple fix would be to add # set local seed, reset random state generator after function aborts
if (is.na(seed)) {
kick <- runif(1L)
withr::local_preserve_seed()
} else {
withr::local_seed(seed)
} Would that solve your case? Could be any unfortunate side effects? |
An added note. My simulations usually set the for (i in 1:5) {
imp <- mice(nhanes, m = 1, printFlag = FALSE, seed = i)
cat(complete(imp)$bmi |> mean(), "\n")
} This is perhaps a bit safer. |
I was looking into this just now and thougth that the following might be the solution. Forgive me if I am totally wrong since I do not fully understand the if (is.na(seed)) {
withr::local_preserve_seed() # restores .Random.seed on exiting scope
set.seed(NULL) # reinitialize .Random.seed
} else {
withr::local_seed(seed)
} If this is totally wrong, so be it; if not, this appears to be more elegant. |
Yes thanks, that is better as that will restore I just pushed an update. Thanks a lot for your contribution. |
Pretty late to the party, but this new implementation also results in unexpected, and potentially unwanted, behavior. library(mice)
#>
#> Attaching package: 'mice'
#> The following object is masked from 'package:stats':
#>
#> filter
#> The following objects are masked from 'package:base':
#>
#> cbind, rbind
library(magrittr)
set.seed(123)
imp1 <- mice(boys, print = F)
imp1 %$%
lm(hgt ~ bmi + tv) %>%
pool() %>%
summary()
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 45.750588 7.8251969 5.846573 173.68503 2.431348e-08
#> 2 bmi 2.990458 0.4926343 6.070341 92.33349 2.795360e-08
#> 3 tv 3.738712 0.2099027 17.811641 26.77784 2.220446e-16
set.seed(123)
imp2 <- mice(boys, print = F)
imp2 %$%
lm(hgt ~ bmi + tv) %>%
pool() %>%
summary()
#> term estimate std.error statistic df p.value
#> 1 (Intercept) 42.567502 10.7210055 3.970477 12.82719 1.638943e-03
#> 2 bmi 3.165119 0.6533964 4.844103 12.59753 3.503675e-04
#> 3 tv 3.746043 0.2230530 16.794411 19.05428 7.056578e-13 Created on 2022-07-08 by the reprex package (v2.0.1) I am actually unsure whether the |
Ah... what a can of worms this is... We have in mice.R:
We probably need some alternative to |
The function I see three usage scenarios, always assuming the default
As I understand the present state Maybe a Please note: I do not feel strongly about this at all, just naively trying to be of some use. |
I am no expert on random seeds, but my intuition says that these three aspects are incompatible, because if after every Similarly to @GBA19, I do not really understand why you would like to preserve the global seed when you called mice. To me, this seems a bit like asking |
Since the local seed gave various unforeseen problems, |
The change requested in #426 and implemented in #432 has the side-effect of having
mice
return the exactly same set of imputed values on each instantiation of the function.The following code example illustrates this (demonstrated with the mean for simplicity):
replicate(5, {imp <- mice(nhanes, m= 1, printFlag = FALSE); complete(imp)$bmi |> mean()} )
# [1] 27.144 27.144 27.144 27.144 27.144
Given the stochastic nature of multiple imputation this seems unreasonable. This may break some existing scripts---it certainly broke one of mine.
There is a simple workaround, adding a
runif(1)
, but it is preferable to havemice
work reasonably 'out of the box':replicate(5, {runif(1); imp <- mice(nhanes, m= 1, printFlag = FALSE, seed= 12345); complete(imp)$bmi |> mean()} )
# [1] 27.032 26.984 26.900 26.900 26.900
If indeed someone needs exact replications on different instantiations, the present behavior in
3.14.0
, the solution is as simple as the following:replicate(5, {runif(1); imp <- mice(nhanes, m= 1, printFlag = FALSE, seed= 12345); complete(imp)$bmi |> mean()} )
# [1] 26.408 26.408 26.408 26.408 26.408
Hopefully, this is something you are able to solve.
"R version 4.1.2 (2021-11-01)"
Package Version
"mice" "3.14.0"
"withr" "2.4.3"
The text was updated successfully, but these errors were encountered: