Skip to content

Latest commit

 

History

History
525 lines (377 loc) · 18.8 KB

howto_set_up_multiple_users_on_Rstudio_on_Docker.md

File metadata and controls

525 lines (377 loc) · 18.8 KB

Instructions for setting up RStudio running in a Docker container on Jetstream or Digital Ocean, including multiple user logins

Dana Miller

Last updated Febuary 24, 2020 - suggestions to improve these instructions welcome

With thanks to the Rocker Project for R on Docker, Indiana University and the National Science Foundation for cloud compute resources through Jetstream, DIBSCI at UC Davis for documentation, and Carl Boettiger at UC Berkeley for reproducible research inspiration and an initial version of these instructions

Background - two cloud compute providers referenced in this guide

  • Jetstream is an National Science Foundation supported cloud compute resource that runs on the open-source software Atmosphere and OpenStack. Compute credit allocations are available by applying through XCSEDE. If you can contact campus research computing staff, they may be able to provide advice and example applications (eg at UC Berkeley this is the Research Computing group)

  • Digital Ocean is a commercial cloud compute resource (similar to AWS or Azure) offering similar instances to those available on Jetstream. Compute credits are billed hourly, depending on the size of the resource.

Running RStudio in Docker and configuring multiple user logins works on both of these platforms. Most of the instructions below apply to both platforms, anything that is Jetstream or Digital Ocean-specific will be noted.

Set up a new instance for the first time

On Jetstream

If you already have a Jetstream login and allocation, to run R Studio on Jetstream:

  • Log in and go to ‘Create a new project’
  • Enter project name and description
  • Select the size and image you would like to use (images have various software preinstalled): select this Ubuntu image that includes Docker and GUI support
  • Once the new instance for your project is active (green dot), click the “Open web shell” link on the right hand side

On Digital Ocean

  • After creating a Digital Ocean account, create a new instance using this Ubuntu + Docker image (click ‘create Droplet’)
  • Recommend setting up SSH authentication
  • Once the droplet is set up, connect to the droplet’s shell interface by opening a terminal and connecting via ssh with
ssh root@[IP address of your droplet here]

Start a Docker container with R and RStudio

  • In the web shell, first confirm you are able to run docker commands:

    docker run hello-world

    If configured correctly, this will say Hello from Docker!...

    If you get an error about permission denied and docker daemon, try adding yourself to root privileges with sudo usermod -aG docker example_username, where example_username is your username

    • Start a new Docker container that has R, R Studio, and the tidyverse preinstalled (you could also install these and all the required dependencies without Docker but it takes a long time and is potentially error-prone)
    docker run -d -p 8787:8787 --rm -e PASSWORD=example_password rocker/tidyverse

or, if in addition you also want TeX installed and tools for publishing, use the larger verse image:

    docker run -d -p 8787:8787 --rm -e PASSWORD=example_password rocker/verse

What do all those flags mean?

  • “-d” for detach means the container will run ‘in the background’ , and return a command prompt (without this flag your container will be running but you won’t be able to run anything else at the command prompt until the container is stopped)

  • “-p” for port is a port and also the suffix for the URL your RStudio session will be accessible at

  • “-e” for environment is passing the PASSWORD variable to the created computational environment

  • “–rm” for remove , which will automatically delete this container after it exits (note it does not delete the underlying image the container was based on)

  • rocker/tidyverse - this specifies the exact Docker image (already created by the Rocker project) to use. Using the same image, you can set up the exact same computational environment many different times, or many different people can all use it to set up the same environment - More details about this image at https://www.rocker-project.org/images/

  • To find the URL you can use to log into the new R Studio session, run

    echo My RStudio Web server is running at: http://$(hostname):8787/

Note the URL above is http not https, the latter is also possible to set up but requires a custom domain name

Note to copy and paste text from the Jetstream web shell, you have you highlight it and then press Ctrl + Alt + Shift

  • Paste the URL above into your web browser, and to login:

    • The default username is “rstudio”
    • The password is whatever you chose in the docker run... command above for example_password

    Congrats, you are now running RStudio on Jetstream!

    The rstudio user has permission to install packages.

Optional UNIX configuration notes:

  • whoami in at the command line is not the same as the user for logging in to the R Studio browser prompt
  • Specifying a username with docker run -u... also doesn’t work for the R Studio login prompt

Configure git with git config

If you will be using git to incrementally save your work on the instance and then push the saved changes to a remote repository like GitHub, each user who separtely logs into the RStudio instance will need to run the two git configuration commands below in the RStudio Terminal pane.

 git config --global user.name "Your Name"
 git config --global user.email your_username@users.noreply.github.com

Note that for the user email, you can either use your own email, or use the ‘noreply’ email generated for you by GitHub to avoid exposing your personal email address in your commit history. See here for details.

Optional next step - adding multiple users and configuring their permissions

There are two ways to do this:

  1. Add users to your existing instance, or
  2. Launch a separate R Studio instance bound to a different port

1. Add more users to an existing instance

  • With your Jetstream instance active, launch the web shell

  • See the ids of your running Docker container with RStudio by running docker ps. You will need to write down or remember the first 5 digits of your CONTAINER ID (a long string of numbers and letters)

  • To run commands inside that container, run

    docker exec -ti CONTAINER_ID bash

Where CONTAINER_ID is eg “904b2”, the first 5 digits. Notice your prompt will change now that you are inside the container.

  • Now you can use regular UNIX commands to create a new user (this is not Docker-specific), where example_user is the user name you are creating (the name can be anything you choose).
 adduser example_user

Enter a password, press ‘Enter’ to leave the other questions blank (eg Name, Work Phone…), they are not required, and press Y for “Yes”

  • At this point, someone could go to the same URL you set up above, and login with this new username and password but they wouldn’t be able to install any additional R packages (no root access)

  • To give the new user the right to download packages, add them to the staff group. In the command below, -aG means ‘append to group’

 usermod -aG staff example_user

To see a list of all the users (this includes some other system info) as well

cat /etc/passwd

To change a password (note passwd below is not a typo)

sudo passwd example_username

When you are done configuring the user info, type

exit

to return to the command line outside of that container.

And the login URL for the new users same as above, at:

echo My RStudio Web server is running at: http://$(hostname):8787/

2. Launch a separate R Studio instance bound to a different port

Alternatively, start an additional Docker container on the same Jetstream instance with a different port number

docker run -d -p 8888:8787 -rm -e PASSWORD=newpassword rocker/tidyverse 

The default username will still be “rstudio”, and the password will be whatever you just specified, but the login URL will be different (note the port number is changed above and accordingly also changed below)

echo My RStudio Web server is running at: http://$(hostname):8888/

Optional next steps - installing curl, pip3 and aws cli in the terminal after RStudio is running

  • curl is a common command line utility that you will probably need if you are downloading data or files from URLs
  • pip3 is the package manager for the python programming language. In this case, it’s required to install aws cli
  • aws cli is a command line interface for Amazon Web Services

The following steps were tested on a fresh verse Docker image.

In the ‘Terminal’ pane of RStudio :

  1. Install curl, eg following instructions here
sudo apt-get update

sudo apt-get install curl

curl --version
  1. Install pip - use python 3
  • See “Install pip” instructions here
    • verse image has python three, so use the python3 variants, eg python3 get-pip.py --user, not python get-pip.py --user

    • For the step about editing your bash profile, it will probably be called .profile.

    • You can use vim to edit the file (type vim .profile to open the file:

      • i to go into “insert” (editing) mode
      • add export PATH=~/.local/bin:$PATH to the file below anything in it already (you can start a line with # as a comment to explain why it’s there)
      • press : to exit editing mode and wq to save (write) and quit to go back to the terminal
    • Per the AWS docs, make sure you source the file after editing it! source ~/.bash_profile

  1. Install aws cli

Continue to follow instructions here

Optional next steps - installing most R packages

Packages on CRAN (most R packages you’ll need)

  • The verse image has the tidyverse packages preinstalled, which you can see with
library(tidyverse)
  • Most additional R packages can be installed as usual from CRAN in the R console, eg with
install.packages('skimr')  # skimr is the name of the package
                           # package names must be in quotes
  • Packages we commonly install that are not pre-installed in the verse Docker image include: skimr, cowplot, here, and janitor

Packages on GitHub

  • If you encounter a package that is not available on CRAN, eg aws.s3 from the cloudyr project (which as of this writing was removed from CRAN in January 2019 for lack of an ongoing maintainer ), 1) An archive may be available from CRAN (check the package’s old CRAN page), 2) It is probably available on GitHub.

  • To install aws.s3 from GitHub, run (and can see more detailed instructions here)

install.packages("aws.s3", repos = c("cloudyr" = "http://cloudyr.github.io/drat"))

Optional next steps - installing sparklyr and spark

First, install sparklyr, and use it to install spark

install.packages(`sparklyr`)

sparklyr::spark_install()

If this works, great! If you try and create a spark connection and get an error about installing Java 8, see below

If you need to troubleshoot Java 8 vs Java 11 (requires reinstalling Java 😱, but you can do it 💪 )

  • As documented by other sparklyr users here, and the ‘Mastering Spark in R’ book here, sparklyr requires the Java Runtime Environment, version Java 8.
  • However, the Docker container from the Rocker project that we just set up RStudio in comes with a different version, Java 11. Specifically, the container is using the Debian version of the Linux, and (as of this writing) the current version of Debian, Debian 10, comes with OpenJDK version 11, which is an open-source version of the Java Runtime Environment.
  • Note this does mean that even though your Jetstream or Digital Ocean instance is running the Ubuntu version of Linux, the RStudio session running inside the Docker container is running in the Debian version of Linux. Installing software on Debian and Ubuntu can be different via the command line, so make sure to specify the version of Linux when you are searching for installation help.

You can check your java version in the terminal with:

java -version

The steps below follow the steps suggested in the ‘Java errors with sparklyr?’ issue linked above

Removing OpenJDK Java 11
  • Here are the official Java instructions on uninstalling Java on Linux
  • The Docker container appeared not to have an RPM install, after searching for the java install location with which java I removed it with
sudo rm -r /usr/lib/jvm/java-11-openjdk-amd64

and confirmed java was uninstalled with

java -version
Installing OpenJDK Java 8 on Debian 10
  • Unfortunately, this will not work, since apparently it’s not available by default for Debian 10
# Seems like it might work, but will not
apt-get install openjdk-8-jdk
  • Fortunately, there is an alternative. I followed instructions from AdoptOpenDK here for installing OpenJDK on Debian here (specifically the “DEB installation on Debian or Ubuntu”)

  • Per the messages from following those instructions, first I needed to install gnupg2, which is “a complete and free implementation of the OpenPGP standard” (link to project), which is needed for the step where you get the GPG key below

sudo apt-get install gnupg2
  • Then could follow these steps from the instructions above
# Get AdoptOpenJDK GPG key 
wget -qO - https://adoptopenjdk.jfrog.io/adoptopenjdk/api/gpg/key/public | sudo apt-key add -

#  Import AdoptOpenJDK DEB repository
sudo add-apt-repository --yes https://adoptopenjdk.jfrog.io/adoptopenjdk/deb/

#refresh package list
apt-get update

#install
apt-get install adoptopenjdk-8-hotspot

Check your java version again:

java -version

Now it should say openjdk version 1.8, eg

openjdk version "1.8.0_242"
OpenJDK Runtime Environment (AdoptOpenJDK)(build 1.8.0_242-b08)
OpenJDK 64-Bit Server VM (AdoptOpenJDK)(build 25.242-b08, mixed mode)

Now 1) Restart your R session 2) Try re-running config <- spark_config() - hopefully this resolves the issue for you too 🎉

More references