You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi everyone, we are currently working on Databricks and would like to set up a Docker image so that we don't have to reinstall the library every time we start a cluster. Unfortunately, the library cannot be loaded after installation and we have not yet been able to identify the exact cause.
Below you can find the Docker file to reproduce the error.
Important: the installation takes about 70 minutes.
In addition, you must download the following file and place it in the corresponding folder with the Docker file: Rprofile.site
FROM databricksruntime/minimal:13.3-LTS
# Label version in case we need to force reinstall
LABEL version="1.1"
# Suppress interactive configuration prompts
ENV DEBIAN_FRONTEND=noninteractive
# Set the CRAN mirror
ENV R_CRAN_MIRROR https://cran.uni-muenster.de/
# Install python 3.8 and virtualenv for Spark and Notebooks
RUN apt-get update \
&& apt-get install -y \
python3.10 \
virtualenv
# We add RStudio's debian source to install the latest r-base version (4.1)
# We are using the more secure long form of pgp key ID of marutter@gmail.com
# based on these instructions (avoiding firewall issue for some users):
# https://cran.rstudio.com/bin/linux/ubuntu/#secure-apt
RUN apt-get update \
&& apt-get install --yes software-properties-common apt-transport-https \
&& gpg --keyserver hkp://keyserver.ubuntu.com:80 --recv-keys E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& gpg -a --export E298A3A825C0D65DFD57CBB651716619E084DAB9 | sudo apt-key add - \
&& add-apt-repository -y "deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu $(lsb_release -cs)-cran40/" \
&& apt-get update \
&& apt-get install --yes \
libssl-dev \
r-base \
r-base-dev \
&& add-apt-repository -r "deb [arch=amd64,i386] https://cran.rstudio.com/bin/linux/ubuntu $(lsb_release -cs)-cran40/" \
&& apt-key del E298A3A825C0D65DFD57CBB651716619E084DAB9 \
&& apt-get clean \
&& rm -rf /var/lib/apt/lists/* /tmp/* /var/tmp/*
# hwriterPlus is used by Databricks to display output in notebook cells
# hwriterPlus is removed for newer version of R, so we hardcode the dependency to archived version
# Rserve allows Spark to communicate with a local R process to run R code
RUN R -e "options(repos = list(CRAN = 'https://cloud.r-project.org/')); install.packages(c('hwriter', 'TeachingDemos', 'htmltools'))" \
&& R -e "install.packages('https://cran.r-project.org/src/contrib/Archive/hwriterPlus/hwriterPlus_1.0-3.tar.gz', repos=NULL, type='source')" \
&& R -e "install.packages('Rserve', repos='http://rforge.net/')"
# Additional instructions to setup rstudio. If you dont need rstudio, you can
# omit the below commands in your docker file. Even after this you need to use
# an init script to start the RStudio daemon (See README.md for details.)
# Databricks configuration for RStudio sessions.
COPY Rprofile.site /usr/lib/R/etc/Rprofile.site
# Rstudio installation.
RUN apt-get update \
# Install gdebi-core.
&& apt-get install -y gdebi-core \
# Download rstudio 1.4 package for ubuntu 18.04 and install it.
&& apt-get install -y wget \
&& apt-get install -y gdebi-core \
&& wget https://s3.amazonaws.com/rstudio-ide-build/server/jammy/amd64/rstudio-server-2022.12.1-366-amd64.deb \
&& gdebi -n rstudio-server-2022.12.1-366-amd64.deb \
&& rstudio-server version
# Initialize the default environment that Spark and notebooks will use
RUN virtualenv -p python3.10 --system-site-packages /databricks/python3
# install relevant packages
RUN R -e "install.packages('rstan', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
RUN R -e "install.packages('remotes', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
RUN R -e "install.packages('rstanarm', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
RUN R -e "install.packages('dplyr', repos = c(CRAN = Sys.getenv('R_CRAN_MIRROR')))"
# Print the installed packages
RUN R -e "library('rstan')"
RUN R -e "library('dplyr')"
RUN R -e "library('rstanarm')"
# verify rstan installation
#RUN R -e "example(stan_model, package = 'rstan', run.dontrun = TRUE)"
The text was updated successfully, but these errors were encountered:
Hi everyone, we are currently working on Databricks and would like to set up a Docker image so that we don't have to reinstall the library every time we start a cluster. Unfortunately, the library cannot be loaded after installation and we have not yet been able to identify the exact cause.
Below you can find the Docker file to reproduce the error.
Important: the installation takes about 70 minutes.
In addition, you must download the following file and place it in the corresponding folder with the Docker file: Rprofile.site
The text was updated successfully, but these errors were encountered: