Instructions for setting up RStudio running in a Docker container on Jetstream or Digital Ocean, including multiple user logins
Dana Miller
- Background - two cloud compute providers referenced in this guide
- Set up a new instance for the first time
- Start a Docker container with R and RStudio
- Configure git with
git config
- Optional next step - adding multiple users and configuring their permissions
- Optional next steps - installing
curl
,pip3
andaws cli
in the terminal after RStudio is running - Optional next steps - installing most R packages
- Optional next steps - installing
sparklyr
andspark
- More references
Last updated Febuary 24, 2020 - suggestions to improve these instructions welcome
With thanks to the Rocker Project for R on Docker, Indiana University and the National Science Foundation for cloud compute resources through Jetstream, DIBSCI at UC Davis for documentation, and Carl Boettiger at UC Berkeley for reproducible research inspiration and an initial version of these instructions
-
Jetstream is an National Science Foundation supported cloud compute resource that runs on the open-source software Atmosphere and OpenStack. Compute credit allocations are available by applying through XCSEDE. If you can contact campus research computing staff, they may be able to provide advice and example applications (eg at UC Berkeley this is the Research Computing group)
-
Digital Ocean is a commercial cloud compute resource (similar to AWS or Azure) offering similar instances to those available on Jetstream. Compute credits are billed hourly, depending on the size of the resource.
Running RStudio in Docker and configuring multiple user logins works on both of these platforms. Most of the instructions below apply to both platforms, anything that is Jetstream or Digital Ocean-specific will be noted.
If you already have a Jetstream login and allocation, to run R Studio on Jetstream:
- Log in and go to ‘Create a new project’
- Enter project name and description
- Select the size and image you would like to use (images have various software preinstalled): select this Ubuntu image that includes Docker and GUI support
- Once the new instance for your project is active (green dot), click the “Open web shell” link on the right hand side
- After creating a Digital Ocean account, create a new instance using this Ubuntu + Docker image (click ‘create Droplet’)
- Recommend setting up SSH authentication
- Once the droplet is set up, connect to the droplet’s shell interface by opening a terminal and connecting via ssh with
ssh root@[IP address of your droplet here]
-
In the web shell, first confirm you are able to run docker commands:
docker run hello-world
If configured correctly, this will say
Hello from Docker!...
If you get an error about permission denied and docker daemon, try adding yourself to root privileges with
sudo usermod -aG docker example_username
, whereexample_username
is your username- Start a new Docker container that has R, R Studio, and the
tidyverse
preinstalled (you could also install these and all the required dependencies without Docker but it takes a long time and is potentially error-prone)
- Start a new Docker container that has R, R Studio, and the
docker run -d -p 8787:8787 --rm -e PASSWORD=example_password rocker/tidyverse
or, if in addition you also want TeX installed and tools for publishing,
use the larger verse
image:
docker run -d -p 8787:8787 --rm -e PASSWORD=example_password rocker/verse
What do all those flags mean?
-
“-d” for detach means the container will run ‘in the background’ , and return a command prompt (without this flag your container will be running but you won’t be able to run anything else at the command prompt until the container is stopped)
-
“-p” for port is a port and also the suffix for the URL your RStudio session will be accessible at
-
“-e” for environment is passing the
PASSWORD
variable to the created computational environment -
“–rm” for remove , which will automatically delete this container after it exits (note it does not delete the underlying image the container was based on)
-
rocker/tidyverse
- this specifies the exact Docker image (already created by the Rocker project) to use. Using the same image, you can set up the exact same computational environment many different times, or many different people can all use it to set up the same environment - More details about this image at https://www.rocker-project.org/images/ -
To find the URL you can use to log into the new R Studio session, run
echo My RStudio Web server is running at: http://$(hostname):8787/
Note the URL above is http not https, the latter is also possible to set up but requires a custom domain name
Note to copy and paste text from the Jetstream web shell, you have you highlight it and then press Ctrl + Alt + Shift
-
Paste the URL above into your web browser, and to login:
- The default username is “rstudio”
- The password is whatever you chose in the
docker run...
command above forexample_password
Congrats, you are now running RStudio on Jetstream!
The
rstudio
user has permission to install packages.
Optional UNIX configuration notes:
whoami
in at the command line is not the same as the user for logging in to the R Studio browser prompt- Specifying a username with
docker run -u...
also doesn’t work for the R Studio login prompt
If you will be using git
to incrementally save your work on the
instance and then push the saved changes to a remote repository like
GitHub, each user who separtely logs into the RStudio instance will need
to run the two git configuration commands below in the RStudio Terminal
pane.
git config --global user.name "Your Name"
git config --global user.email your_username@users.noreply.github.com
Note that for the user email, you can either use your own email, or use the ‘noreply’ email generated for you by GitHub to avoid exposing your personal email address in your commit history. See here for details.
There are two ways to do this:
- Add users to your existing instance, or
- Launch a separate R Studio instance bound to a different port
-
With your Jetstream instance active, launch the web shell
-
See the ids of your running Docker container with RStudio by running
docker ps
. You will need to write down or remember the first 5 digits of your CONTAINER ID (a long string of numbers and letters) -
To run commands inside that container, run
docker exec -ti CONTAINER_ID bash
Where CONTAINER_ID is eg “904b2”, the first 5 digits. Notice your prompt will change now that you are inside the container.
- Now you can use regular UNIX commands to create a new user (this is
not Docker-specific), where
example_user
is the user name you are creating (the name can be anything you choose).
adduser example_user
Enter a password, press ‘Enter’ to leave the other questions blank (eg Name, Work Phone…), they are not required, and press Y for “Yes”
-
At this point, someone could go to the same URL you set up above, and login with this new username and password but they wouldn’t be able to install any additional R packages (no root access)
-
To give the new user the right to download packages, add them to the
staff
group. In the command below,-aG
means ‘append to group’
usermod -aG staff example_user
To see a list of all the users (this includes some other system info) as well
cat /etc/passwd
To change a password (note passwd
below is not a typo)
sudo passwd example_username
When you are done configuring the user info, type
exit
to return to the command line outside of that container.
And the login URL for the new users same as above, at:
echo My RStudio Web server is running at: http://$(hostname):8787/
Alternatively, start an additional Docker container on the same Jetstream instance with a different port number
docker run -d -p 8888:8787 -rm -e PASSWORD=newpassword rocker/tidyverse
The default username will still be “rstudio”, and the password will be whatever you just specified, but the login URL will be different (note the port number is changed above and accordingly also changed below)
echo My RStudio Web server is running at: http://$(hostname):8888/
curl
is a common command line utility that you will probably need if you are downloading data or files from URLspip3
is the package manager for thepython
programming language. In this case, it’s required to installaws cli
aws cli
is a command line interface for Amazon Web Services
The following steps were tested on a fresh verse
Docker image.
In the ‘Terminal’ pane of RStudio :
- Install curl, eg following instructions here
sudo apt-get update
sudo apt-get install curl
curl --version
- Install pip - use python 3
- See “Install pip” instructions
here
-
verse
image has python three, so use the python3 variants, egpython3 get-pip.py --user
, notpython get-pip.py --user
-
For the step about editing your bash profile, it will probably be called
.profile
. -
You can use
vim
to edit the file (typevim .profile
to open the file:i
to go into “insert” (editing) mode- add
export PATH=~/.local/bin:$PATH
to the file below anything in it already (you can start a line with#
as a comment to explain why it’s there) - press
:
to exit editing mode andwq
to save (write) and quit to go back to the terminal
-
Per the AWS docs, make sure you source the file after editing it!
source ~/.bash_profile
-
- Install
aws cli
Continue to follow instructions here
- The
verse
image has thetidyverse
packages preinstalled, which you can see with
library(tidyverse)
- Most additional R packages can be installed as usual from CRAN in the R console, eg with
install.packages('skimr') # skimr is the name of the package
# package names must be in quotes
- Packages we commonly install that are not pre-installed in the
verse
Docker image include:skimr
,cowplot
,here
, andjanitor
-
If you encounter a package that is not available on CRAN, eg
aws.s3
from thecloudyr
project (which as of this writing was removed from CRAN in January 2019 for lack of an ongoing maintainer ), 1) An archive may be available from CRAN (check the package’s old CRAN page), 2) It is probably available on GitHub. -
To install
aws.s3
from GitHub, run (and can see more detailed instructions here)
install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))
First, install sparklyr
, and use it to install spark
install.packages(`sparklyr`)
sparklyr::spark_install()
If this works, great! If you try and create a spark connection and get an error about installing Java 8, see below
- As documented by other sparklyr users
here, and the
‘Mastering Spark in R’ book
here,
sparklyr
requires the Java Runtime Environment, version Java 8. - However, the Docker container from the Rocker project that we just set up RStudio in comes with a different version, Java 11. Specifically, the container is using the Debian version of the Linux, and (as of this writing) the current version of Debian, Debian 10, comes with OpenJDK version 11, which is an open-source version of the Java Runtime Environment.
- Note this does mean that even though your Jetstream or Digital Ocean instance is running the Ubuntu version of Linux, the RStudio session running inside the Docker container is running in the Debian version of Linux. Installing software on Debian and Ubuntu can be different via the command line, so make sure to specify the version of Linux when you are searching for installation help.
You can check your java version in the terminal with:
java -version
The steps below follow the steps suggested in the ‘Java errors with sparklyr?’ issue linked above
- Here are the official Java instructions on uninstalling Java on Linux
- The Docker container appeared not to have an RPM install, after
searching for the java install location with
which java
I removed it with
sudo rm -r /usr/lib/jvm/java-11-openjdk-amd64
and confirmed java was uninstalled with
java -version
- Unfortunately, this will not work, since apparently it’s not available by default for Debian 10
# Seems like it might work, but will not
apt-get install openjdk-8-jdk
-
Fortunately, there is an alternative. I followed instructions from AdoptOpenDK here for installing OpenJDK on Debian here (specifically the “DEB installation on Debian or Ubuntu”)
-
Per the messages from following those instructions, first I needed to install
gnupg2
, which is “a complete and free implementation of the OpenPGP standard” (link to project), which is needed for the step where you get the GPG key below
sudo apt-get install gnupg2
- Then could follow these steps from the instructions above
# Get AdoptOpenJDK GPG key
wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | sudo apt-key add -
# Import AdoptOpenJDK DEB repository
sudo add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/
#refresh package list
apt-get update
#install
apt-get install adoptopenjdk-8-hotspot
Check your java version again:
java -version
Now it should say openjdk version 1.8
, eg
openjdk version "1.8.0_242"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.242-b08, mixed mode)
Now 1) Restart your R session 2) Try re-running config <- spark_config()
- hopefully this resolves the issue for you too 🎉
- DIBSCI docs on setting up a Jetstream
instance
and installing and running RStudio on
Jetstream
- note some of the instructions are specific to their workshop
- See more details from the Rocker Project on managing user authentication here
- Docker docs