Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

rmarkdown::render crashes in parallel with parallel::makeForkCluster on macOS and Apple Silicon #2561

Open
4 of 5 tasks
huguesmercier opened this issue Jul 1, 2024 · 6 comments
Labels
feature a feature request or enhancement help wanted ❤️ we'd love your help!

Comments

@huguesmercier
Copy link

Problem description

I am knitting multiple HTML documents from RMD files in parallel. Here is a small sample reproducing the bug:

library(foreach)
library(doParallel)
library(rmarkdown)

cluster <- parallel::makeForkCluster(bigstatsr::nb_cores())
# cluster <- parallel::makePSOCKcluster(bigstatsr::nb_cores())

doParallel::registerDoParallel(cluster)

foreach(temp_counter = seq(1, 10)) %dopar% {

  html_file <- (paste0(temp_counter, ".html"))
  Rmd_file <- (paste0(temp_counter, ".Rmd"))
  
  text_string <- paste0("---\n", "title: 'TEST'\n", "---\n", "```{r setup, echo=FALSE}\n","print(temp_counter)\n","```")
  write(text_string, file = Rmd_file, append = FALSE)
  
  rmarkdown::render(Rmd_file, output_format = "html_document", output_file = html_file)
}

parallel::stopCluster(cluster)

I receive error messages like this:

The process has forked and you cannot use this CoreFoundation functionality safely. You MUST exec().
Break on __THE_PROCESS_HAS_FORKED_AND_YOU_CANNOT_USE_THIS_COREFOUNDATION_FUNCTIONALITY___YOU_MUST_EXEC__() to debug.
objc[19006]: +[__NSPlaceholderDate initialize] may have been in progress in another thread when fork() was called. We cannot safely call it or ignore it in the fork() child process. Crashing instead. Set a breakpoint on objc_initializeAfterForkError to debug.
  • parallel::makePSOCKcluster works fine.
  • It is only rmarkdown::render that does not work with parallel::makeForkCluster. The .Rmd files are generated properly with parallel::makeForkCluster.

System information

The bug appears on multiple versions of RStudio / rmarkdown / pandoc, including this one:

R version 4.3.2 (2023-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Sonoma 14.5, RStudio 2024.4.2.764

Locale: en_US.UTF-8 / en_US.UTF-8 / en_US.UTF-8 / C / en_US.UTF-8 / en_US.UTF-8

Package version:
base64enc_0.1.3 bslib_0.6.0 cachem_1.0.8 cli_3.6.2 digest_0.6.33 ellipsis_0.3.2 evaluate_0.23
fastmap_1.1.1 fontawesome_0.5.2 fs_1.6.3 glue_1.7.0 graphics_4.3.2 grDevices_4.3.2 highr_0.10
htmltools_0.5.7 jquerylib_0.1.4 jsonlite_1.8.7 knitr_1.43 lifecycle_1.0.4 magrittr_2.0.3 memoise_2.0.1
methods_4.3.2 mime_0.12 R6_2.5.1 rappdirs_0.3.3 rlang_1.1.4 rmarkdown_2.22 sass_0.4.7
stats_4.3.2 stringi_1.8.2 stringr_1.5.1 tinytex_0.49 tools_4.3.2 utils_4.3.2 vctrs_0.6.5
xfun_0.41 yaml_2.3.7

Pandoc version: 3.1.11

Checklist

When filing a bug report, please check the boxes below to confirm that you have provided us with the information we need. Have you:

  • formatted your issue so it is easier for us to read?

  • included a minimal, self-contained, and reproducible example?

  • pasted the output from xfun::session_info('rmarkdown') in your issue?

  • upgraded all your packages to their latest versions (including your versions of R, the RStudio IDE, and relevant R packages)?

  • installed and tested your bug with the development version of the rmarkdown package using remotes::install_github("rstudio/rmarkdown") ?

@cderv
Copy link
Collaborator

cderv commented Jul 3, 2024

I don't know enough on using parallel with Fork logic so that will be hard to investigate. I know that rmarkdown::render on parallel logic is not ideal if you don't copy files and use external process to render. render() logic needs to access and creates intermediates files, and naming does not take into account a random name, so there could be conflict.

Related parallel issue

So this could be a new occurence of using rmarkdown in parallel logic. I would say this is a limitation.

If you can get to the bottom of what is happening we could think of a fix. Any help appreciated on this one.

@cderv cderv added help wanted ❤️ we'd love your help! feature a feature request or enhancement labels Jul 3, 2024
@huguesmercier
Copy link
Author

huguesmercier commented Jul 3, 2024

Thanks for the reply. I did have a look at these earlier issues that you mentioned and tried some of the proposed fixes, to no avail. Two additional notes:

  • The code I submitted has a different .Rmd file per thread. I also tried to put these files in a different folder for each thread, but this did not fix the issue.
  • The issue seems specific to macOS; we could not reproduce it on linux.

@huguesmercier
Copy link
Author

  • One last note: the bug is also present if I run the script from a terminal in R, outside of RStudio.

@cderv
Copy link
Collaborator

cderv commented Jul 3, 2024

The issue seems specific to macOS; we could not reproduce it on linux.

This is really interesting. @yihui do you know anything as a Mac user on this type of run ?

@huguesmercier
Copy link
Author

huguesmercier commented Sep 16, 2024

One more comment after further tests: the bug remains if I run only one instance in parallel:

foreach(temp_counter = seq(1, 1)) %dopar% { ... }

The error message makes little sense in this context.

@yihui
Copy link
Member

yihui commented Sep 19, 2024

This problem is very deep... At the bottom, it's caused by the default options(bitmapType = 'quartz') on macOS, and quartz is an on-screen device. Then the default device for R Markdown is png() when the output format is HTML, and png() uses options(bitmapType) as the default value for its type argument. As a result, you ended up calling an on-screen device in the forked processes, which led to the errors you saw.

From ?parallel::mcfork:

Child processes should never use on-screen graphics devices.

If you set options(bitmapType = 'cairo') before you call rmarkdown::render(), the previous errors should disappear. However, I get these errors:

pandoc: /var/folders/.../T//Rtmp.../rmarkdown-str....html: withBinaryFile: does not exist (No such file or directory)

This might have revealed a bug of rmarkdown when running render() in parallel.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature a feature request or enhancement help wanted ❤️ we'd love your help!
Projects
None yet
Development

No branches or pull requests

3 participants