Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wiki: best practices for creating dockerfiles #515

Open
nick-youngblut opened this issue Jan 16, 2023 · 6 comments
Open

wiki: best practices for creating dockerfiles #515

nick-youngblut opened this issue Jan 16, 2023 · 6 comments

Comments

@nick-youngblut
Copy link

It would be helpful to include docs on how to create dockerfiles building off of rocker/r-base (or similar). For instance:

  • How should one keep the build time to a minimum?
    • e.g., how to use pre-built R packages?
  • How should one keep the image size to a minimum?
    • e.g., use clean=TRUE in install.packages()?
  • How should one provide a long list of R packages (and specific versions)?
    • e.g., use install2.r as in RUN install2.r --ncpus 2 --error argparse dplyr tidyr && rm -rf /tmp/downloaded_packages?
  • How does one install specific R package versions?
@nick-youngblut
Copy link
Author

As an example, https://github.com/rocker-org/rocker-versioned2/pkgs/container/tidyverse#install-r-packages states:

Please install R packages from source using the install.packages() R function or the install2.r script, and use apt only to install necessary system libraries (e.g. libxml2). Do not use apt install r-cran-* to install R packages.

...but a build that involves installing bioconductor packages from source (& the MANY dependencies required for any bioconductor package) can take >1.5 hours. There must be a better way. For instance, how is rocker/tidyverse built in order to minimize the build time?

Also, it would be helpful if the docs include installBioc.r and not just install2.r

@eddelbuettel
Copy link
Member

Briefest possible answer: start with eddelbuettel/r2u which comes in Ubuntu jammy and focal flavours with over 20k CRAN binaries and over 200 BioC binaries. See more at https://eddelbuettel.github.io/r2u/ (and this will eventually be a part of rocker once I get around reorganising this).

Note that it is NOT a direct descendant of rocker/r-base as the latter is Debian based, and nobody has access to all of CRAN premade for Debian whereas I am able to provide it for Ubuntu; see the r2u docs for more.

@cboettig
Copy link
Member

Thanks for raising the issue and apologies for the confusion here. Note that there are essentially two separate stacks in rocker that meet different needs, as noted in the README in this repo, and they serve different needs. Dirk summarizes above one of the approaches in what the README calls the un-versioned stack.

The versioned stack, that you have linked in your example, includes those images (r-ver, rstudio, tidyverse, etc) built from sources in rocker-org/versioned2, and the best practices are indeed the ones you cite -- e.g. install R packages with install.packages or install2.r script wrapper. Please note that the versioned stack is using Ubuntu-based images configured with RSPM package manager as the default mirror, along with the appropriate headers, which means that install.packages() will install prebuilt binaries. This is how packages are installed on rocker/tidyverse, You can try building tidyverse Dockerfiles yourself to confirm (or just look at the logs, a bit buried in there but looks like it takes about 113 seconds).

Regarding versioning, note that Rocker versioned stack locks images based on their R version tag. Once an image is no longer the latest version (e.g. rocker/tidyverse:4.2.1 say), packages are locked by using the RSPM frozen snapshot to immediately before the release of the latest version. This allows latest to act as a rolling version always containing the latest version up until the day the R version rolls over, and everything is frozen. This is done by setting the the default CRAN repo, meaning that again users don't have to do anything to install a consistent version. using rocker/tidyverse:4.2.1, or any other previous version, ensures the build will always have identical versions of all packages, and that those packages are all concurrent. Hope that description makes sense. Naturally there are cases where users want to install specific versions of packages, where a tool like renv may be appropriate.

@eitsupi
Copy link
Member

eitsupi commented Jan 17, 2023

Have you seen the Rocker Project website? https://rocker-project.org/use/extending.html
Although the content is not sufficiently rich, I believe we were able to describe the basic content at the time of last year's renewal.
(By the way, I noticed that the link from this repository was to DockerHub, not the Rocker website, so I updated it.)

@eddelbuettel @cboettig The wiki content in this repository is outdated and I believe much of the content has been ported over to the website.
So I think it would be better to make the wiki read-only and direct people to the website.
What do you think?

@nick-youngblut
Copy link
Author

Thank you all so much for your rapid feedback! I hope that I have not been too annoying with my list of documentation requests. I would be happy to help with PRs, if you'd like (I just need to understand best-practices myself).

@eitsupi
Copy link
Member

eitsupi commented Jan 19, 2023

@nick-youngblut Thanks, PRs for https://github.com/rocker-org/website are very welcome!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants