Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Bug when running rstanarm in parallel future backened in SLURM in drake version 7.0 and above #929

Closed
csetraynor opened this issue Jul 4, 2019 · 5 comments
Assignees

Comments

@csetraynor
Copy link

csetraynor commented Jul 4, 2019

I am not sure and would like to see another SLURM user trying this out. But I could not solve an issue when fitting an rstanarm object ( do not know if the problem also happens in other cases) in parallelised backened mode using future in SLURM. The error does not happen when running the model locally or with the previous version of drake 3.5
This would be a reproducible example:

library(drake)
library(rstanarm)
library(future.batchtools)
example_model_1 <- function(){
  stan_glmer(cbind(incidence, size - incidence) ~ size + period + (1|herd),
             data = lme4::cbpp, family = binomial, QR = TRUE,
             # this next line is only to keep the example small in size!
             chains = 4, cores = 4, seed = 12345, iter = 20000)
}

plan <- drake_plan(
  mod1 = example_model_1()
)


future::plan(batchtools_slurm, template = "SLURM/batchtools_slurm.tmpl")

make(plan, parallelism = "future", jobs = 1)

Error: Target mod1 failed. Call diagnose(mod1) for details. Error message:
trying to get slot "mode" from an object (class "try-error") that is not an S4 object

sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/scp/software/mro/3.5.1-foss-2017a/lib64/R/lib/libRblas.so
LAPACK: /opt/scp/software/mro/3.5.1-foss-2017a/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] future.batchtools_0.7.1 future_1.9.0           
[3] rstanarm_2.18.1         Rcpp_1.0.1.3           
[5] drake_7.4.0.9000        RevoUtils_11.0.1       
[7] RevoUtilsMath_11.0.0   

loaded via a namespace (and not attached):
  [1] nlme_3.1-131          fs_1.2.5             
  [3] matrixStats_0.54.0    xts_0.11-0           
  [5] progress_1.2.0        filelock_1.0.1       
  [7] threejs_0.3.1         splines2_0.2.8       
  [9] rstan_2.18.2          tools_3.5.1          
 [11] backports_1.1.2       R6_2.4.0             
 [13] DT_0.4                lazyeval_0.2.1       
 [15] colorspace_1.3-2      withr_2.1.2          
 [17] prettyunits_1.0.2     tidyselect_0.2.5.9000
 [19] gridExtra_2.3         processx_3.1.0       
 [21] compiler_3.5.1        git2r_0.23.0         
 [23] cli_1.0.0             shinyjs_1.0          
 [25] colourpicker_1.0      checkmate_1.8.5      
 [27] scales_0.5.0          dygraphs_1.1.1.6     
 [29] ggridges_0.5.0        callr_2.0.4          
 [31] rappdirs_0.3.1        stringr_1.3.1        
 [33] digest_0.6.15         StanHeaders_2.18.1-10
 [35] txtq_0.1.4            minqa_1.2.4          
 [37] base64enc_0.1-3       pkgconfig_2.0.2      
 [39] htmltools_0.3.6       lme4_1.1-17          
 [41] htmlwidgets_1.2       rlang_0.4.0.9000     
 [43] rstudioapi_0.7        shiny_1.1.0          
 [45] zoo_1.8-3             crosstalk_1.0.0      
 [47] gtools_3.8.1          dplyr_0.8.1.9000     
 [49] inline_0.3.15         magrittr_1.5         
 [51] loo_2.1.0.9000        bayesplot_1.5.0      
 [53] Matrix_1.2-14         munsell_0.5.0        
 [55] stringi_1.2.4         yaml_2.2.0           
 [57] tidyproject_0.3.1     debugme_1.1.0        
 [59] MASS_7.3-47           storr_1.2.0          
 [61] pkgbuild_1.0.0        plyr_1.8.4           
 [63] grid_3.5.1            listenv_0.7.0        
 [65] parallel_3.5.1        promises_1.0.1       
 [67] crayon_1.3.4          miniUI_0.1.1.1       
 [69] lattice_0.20-35       splines_3.5.1        
 [71] hms_0.4.2             batchtools_0.9.10    
 [73] zeallot_0.1.0         pillar_1.4.1.9000    
 [75] igraph_1.2.2          markdown_0.8         
 [77] base64url_1.4         shinystan_2.5.0      
 [79] reshape2_1.4.3        codetools_0.2-15     
 [81] stats4_3.5.1          rstantools_1.5.1.9000
 [83] glue_1.3.1.9000       data.table_1.11.4    
 [85] vctrs_0.1.0.9005      nloptr_1.0.4         
 [87] httpuv_1.4.5          gtable_0.2.0         
 [89] purrr_0.2.5           assertthat_0.2.1.9000
 [91] ggplot2_3.2.0         mime_0.5             
 [93] xtable_1.8-2          later_0.7.3          
 [95] rsconnect_0.8.8       survival_2.41-3      
 [97] tibble_2.1.3.9000     shinythemes_1.1.1    
 [99] globals_0.12.1        brew_1.0-6   


@csetraynor csetraynor changed the title Bug when fitting an rstanarm model using future backened in SLURM in drake version 7.4 Bug when running rstanarm in parallel future backened in SLURM in drake version 7.4 Jul 4, 2019
@csetraynor
Copy link
Author

csetraynor commented Jul 4, 2019

Tried with version 7.0.0 gives me the same error.
Here is a more complete error log.

diagnose(mod1)
$name
[1] "mod1"

$target
[1] "mod1"

$imported
[1] FALSE

$missing
[1] TRUE

$seed
[1] 1109102315

$time_start
   user  system elapsed 
 41.329  10.876 110.802 

$file_out
NULL

$isfile
[1] FALSE

$trigger
$trigger$command
[1] TRUE

$trigger$depend
[1] TRUE

$trigger$file
[1] TRUE

$trigger$condition
[1] FALSE

$trigger$change
NULL

$trigger$mode
[1] "whitelist"


$command
[1] "example_model_1()"

$dependency_hash
[1] "a30afda18b12ab32"

$input_file_hash
[1] ""

$output_file_hash
[1] ""

$time_command
$time_command$target
[1] "mod1"

$time_command$elapsed
[1] 17.529

$time_command$user
[1] 4.685

$time_command$system
[1] 2.204


$time_build
$time_build$target
[1] "mod1"

$time_build$elapsed
[1] 18.377

$time_build$user
[1] 5.435

$time_build$system
[1] 2.285


$warnings
[1] "4 function calls resulted in an error"

$error
<simpleError in FUN(X[[i]], ...): trying to get slot "mode" from an object (class "try-error") that is not an S4 object >

@csetraynor
Copy link
Author

Instead with version 5.3.0 everything works fine (but I really want to be able to use the new functionality for large plans in my projects!)

make(plan, parallelism = "future", jobs = 1)
target mod1
> diagnose(mod1)
$name
[1] "mod1"

$target
[1] "mod1"

$imported
[1] FALSE

$foreign
[1] TRUE

$missing
[1] TRUE

$seed
[1] 1029698858

$output_files
NULL

$input_files
NULL

$command
[1] "{\n example_model_1() \n}"

$dependency_hash
[1] "7fa2973d0e9771deadaebd1d77f72fb2e70069a86e5d81cef95786ef07c42e86"

$input_file_hash
[1] "01234566814cd2a2b3cd7c19c907964b4a4f018a19ec224788d9edb74d7376df"

$output_file_hash
[1] "01234566814cd2a2b3cd7c19c907964b4a4f018a19ec224788d9edb74d7376df"

$start
   user  system elapsed 
  4.825   0.477   7.018 

$time_command
  item   type elapsed   user system
1 mod1 target   25.47 14.007  0.392

$time_build
  item   type elapsed   user system
1 mod1 target  27.297 15.757  0.417

> sessionInfo()
R version 3.5.1 (2018-07-02)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: CentOS Linux 7 (Core)

Matrix products: default
BLAS: /opt/scp/software/mro/3.5.1-foss-2017a/lib64/R/lib/libRblas.so
LAPACK: /opt/scp/software/mro/3.5.1-foss-2017a/lib64/R/lib/libRlapack.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods  
[7] base     

other attached packages:
[1] future.batchtools_0.7.1 future_1.9.0           
[3] rstanarm_2.18.1         Rcpp_1.0.1.3           
[5] drake_5.3.0             RevoUtils_11.0.1       
[7] RevoUtilsMath_11.0.0   

loaded via a namespace (and not attached):
  [1] minqa_1.2.4           colorspace_1.3-2     
  [3] ggridges_0.5.0        rsconnect_0.8.8      
  [5] rprojroot_1.3-2       markdown_0.8         
  [7] base64enc_0.1-3       fs_1.2.5             
  [9] rstudioapi_0.7        listenv_0.7.0        
 [11] rstan_2.18.2          DT_0.4               
 [13] codetools_0.2-15      splines_3.5.1        
 [15] R.methodsS3_1.7.1     knitr_1.20           
 [17] shinythemes_1.1.1     zeallot_0.1.0        
 [19] bayesplot_1.5.0       nloptr_1.0.4         
 [21] R.oo_1.21.0           shiny_1.1.0          
 [23] compiler_3.5.1        backports_1.1.2      
 [25] assertthat_0.2.1.9000 Matrix_1.2-14        
 [27] lazyeval_0.2.1        later_0.7.3          
 [29] formatR_1.5           htmltools_0.3.6      
 [31] prettyunits_1.0.2     tools_3.5.1          
 [33] igraph_1.2.2          gtable_0.2.0         
 [35] glue_1.3.1.9000       reshape2_1.4.3       
 [37] dplyr_0.8.1.9000      batchtools_0.9.10    
 [39] rappdirs_0.3.1        tidyproject_0.3.1    
 [41] vctrs_0.1.0.9005      debugme_1.1.0        
 [43] nlme_3.1-131          crosstalk_1.0.0      
 [45] stringr_1.3.1         globals_0.12.1       
 [47] testthat_2.0.0        lme4_1.1-17          
 [49] mime_0.5              miniUI_0.1.1.1       
 [51] gtools_3.8.1          MASS_7.3-47          
 [53] zoo_1.8-3             scales_0.5.0         
 [55] colourpicker_1.0      hms_0.4.2            
 [57] promises_1.0.1        parallel_3.5.1       
 [59] inline_0.3.15         shinystan_2.5.0      
 [61] yaml_2.2.0            gridExtra_2.3        
 [63] ggplot2_3.2.0         loo_2.1.0.9000       
 [65] StanHeaders_2.18.1-10 stringi_1.2.4        
 [67] dygraphs_1.1.1.6      checkmate_1.8.5      
 [69] filelock_1.0.1        pkgbuild_1.0.0       
 [71] storr_1.2.0           rlang_0.4.0.9000     
 [73] pkgconfig_2.0.2       matrixStats_0.54.0   
 [75] evaluate_0.11         lattice_0.20-35      
 [77] purrr_0.2.5           bindr_0.1.1          
 [79] splines2_0.2.8        rstantools_1.5.1.9000
 [81] htmlwidgets_1.2       processx_3.1.0       
 [83] tidyselect_0.2.5.9000 plyr_1.8.4           
 [85] magrittr_1.5          R6_2.4.0             
 [87] base64url_1.4         txtq_0.0.4           
 [89] pillar_1.4.1.9000     withr_2.1.2          
 [91] xts_0.11-0            survival_2.41-3      
 [93] tibble_2.1.3.9000     crayon_1.3.4         
 [95] progress_1.2.0        grid_3.5.1           
 [97] data.table_1.11.4     callr_2.0.4          
 [99] git2r_0.23.0          threejs_0.3.1        
[101] digest_0.6.15         xtable_1.8-2         
[103] httpuv_1.4.5          brew_1.0-6           
[105] R.utils_2.5.0         stats4_3.5.1         
[107] munsell_0.5.0         shinyjs_1.0     

@csetraynor csetraynor changed the title Bug when running rstanarm in parallel future backened in SLURM in drake version 7.4 Bug when running rstanarm in parallel future backened in SLURM in drake version 7.0 and above Jul 4, 2019
@wlandau
Copy link
Member

wlandau commented Jul 4, 2019

I was able to reproduce the error on SGE, and it went away when I set lock_envir to FALSE in make(). I suspect environment locking in drake >= 7.0.0 clashes with rstan's automatic multicore parallelism. Related: https://ropensci.github.io/drake/reference/make.html#self-invalidation, #619, #675.

@wlandau
Copy link
Member

wlandau commented Jul 4, 2019

Well, maybe the parallelism itself does not clash, but rstan or rstanarm somehow tries to modify the global environment after drake locks it for reproducibility.

@csetraynor
Copy link
Author

Ah ok thank you are complete right! It is also working fine for me!
Thanks a bunch!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants