Skip to content

Latest commit

 

History

History
329 lines (210 loc) · 11.2 KB

MPM_starting_VM.md

File metadata and controls

329 lines (210 loc) · 11.2 KB

Prepare the Virtual Machine



Note 1: replace whatever is between <> with the proper value. For example, in <VM.IP> use your Virtual Machine (VM) IP provided (something like 193.166.24.142).
Note 2: check the number of CPUs of your VM using htop (the CPUs available will be displayed at the top as dynamic horizontal bars, numbered sequentially).

The key for both Unix (mgmc.key) and Windows (mgmc.ppk) users is available for download; the password for accessing the Dropbox directory will be available during the course.


Connect to VM

In your computer

  • Note: The SSH key was produced in Linux using ssh-keygen -t rsa -f /home/cloud-user/mgmc.key. More information here.

The key for both Unix (mgmc.key) and Windows (mgmc.ppk) users is available for download; the password for accessing the Dropbox directory will be available during the course.

Unix

Using terminal

# Give user read only permission to the SSH key
chmod 400 </path/to/provided/private/ssh/key/mgmc.key>

ssh -i </path/to/provided/private/ssh/key/mgmc.key> cloud-user@<VM.IP>

Windows

Using PuTTY

  • Note 1: use putty.exe and puttygen.exe from "Alternative binary files" (Download PuTTY)
  • Note 2: to paste text in PuTTY click on mouse right botton inside PuTTY terminal

For information on how to use SSH Keys with PuTTY see here (specifically "Use Existing Public and Private Keys" and "Connect to Server with Private Key" sections)

Briefly: open PuTTY; on the right pannel Connection > SSH > Auth upload the key.ppk in Private key file for authentication, then follow the instructions below

Connection settings (after preparing PuTTY to use SSH key):

putty_connection_info
Yellow steps are optional steps. They serve to avoid introducing always 1-3 steps. After it is saved, everytime it is necessary to connect the VM, it is only need to click 6-8.


Transfer data between your computer and the VM

Unix

Using terminal

From the Local computer to the VM

scp -i </path/to/provided/private/ssh/key/mgmc.key> </path/to/local/data/file> cloud-user@<VM.IP>:</path/to/data/file>

From the VM to the Local computer

scp -i </path/to/provided/private/ssh/key/mgmc.key> cloud-user@<VM.IP>:</path/to/data/file> </path/to/local/data/file>

Using client software

For information on how to use SSH Keys with FileZilla or WinSCP please used the following link


Prepare the VM

Give colour to your terminal

In the VM

Edit ~/.bashrc (using for example nano ~/.bashrc) and uncomment force_color_prompt=yes by removing the #. More information here.

  • Note: After editing exit with Ctrl + X; type y to save changes; don't change the name file by only pressing Enter.

Install generic programs

  • htop
    • An interactive process viewer for Unix systems
    • Allows monitoring VM activity (CPUs and memory usage, proccesses running, etc.)
  • GNU Parallel
    • Shell tool for executing jobs in parallel using one or more computers.

In the VM

sudo apt-get install -y htop parallel

Organize your VM

In the VM

Give right permissions and ownership to the extra volume

sudo chmod 755 /media/volume/
sudo chown cloud-user:cloud-user /media/volume/

Prepare some folders

# Create a folder where all the tools to be used will be placed
mkdir ~/NGStools

# Create a folder to store different databases
mkdir /media/volume/DBs

Install Docker

What is Docker?

"Docker is a tool that can package an application and its dependencies in a virtual container that can run on any Linux server," Lyman explained. "This helps enable flexibility and portability on where the application can run, whether on premise, public cloud, private cloud, bare metal, etc." From here.

Installation

Information on how to install Docker for Ubuntu is available at this link

In the VM

sudo apt-get remove -y docker docker-engine docker.io
sudo apt-get update
sudo apt-get install -y linux-image-extra-$(uname -r) linux-image-extra-virtual
sudo apt-get install -y apt-transport-https ca-certificates curl software-properties-common
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo apt-key add -
sudo apt-key fingerprint 0EBFCD88

# Key with the fingerprint 9DC8 5822 9FC7 DD38 854A E2D8 8D81 803C 0EBF CD88

sudo add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
sudo apt-get update
sudo apt-get install -y docker-ce

sudo docker run hello-world

Run Docker without sudo

More information here.

sudo groupadd docker
sudo gpasswd -a $USER docker

# logout and login to activate the changes to groups (close the terminal and open it again)

docker run hello-world

Install getSeqENA

What is getSeqENA?

Download sequences from ENA/SRA databases

Install dependencies

Aspera Connect

In your computer

From the webpage:

  • See all installers > Linux > Linux - Select Version > Direct download
  • Copy Link Location (Direct download)

In the VM

wget http://download.asperasoft.com/download/sw/connect/3.7.4/aspera-connect-3.7.4.147727-linux-64.tar.gz
tar xf aspera-connect-3.7.4.147727-linux-64.tar.gz
bash aspera-connect-3.7.4.147727-linux-64.sh
rm aspera-connect-3.7.4.147727-linux-64.sh aspera-connect-3.7.4.147727-linux-64.tar.gz
mv ~/.aspera/ ~/NGStools/aspera/
echo "export PATH=$HOME/NGStools/aspera/connect/bin"':$PATH' >> ~/.profile
  • information on add path to PATH environmental variable here

NCBI SRA Toolkit

In your computer

In the webpage:

  • Ubuntu Linux 64 bit architecture
  • Copy Link Location (Ubuntu Linux 64 bit architecture)

In the VM

wget https://ftp-trace.ncbi.nlm.nih.gov/sra/sdk/2.8.2-1/sratoolkit.2.8.2-1-ubuntu64.tar.gz
tar xf sratoolkit.2.8.2-1-ubuntu64.tar.gz
rm sratoolkit.2.8.2-1-ubuntu64.tar.gz
mv sratoolkit.2.8.2-1-ubuntu64/ ~/NGStools/
echo "export PATH=$HOME/NGStools/sratoolkit.2.8.2-1-ubuntu64/bin"':$PATH' >> ~/.profile

Install getSeqENA

In your computer

In the webpage:

  • Clone or download > Copy to clipboard

In the VM

git clone https://github.com/B-UMMI/getSeqENA.git
mv getSeqENA/ ~/NGStools/
echo "export PATH=$HOME/NGStools/getSeqENA"':$PATH' >> ~/.profile

Get INNUca

What is INNUca?

INNUENDO quality control of reads, de novo assembly and contigs quality assessment, and possible contamination detection

In your computer

In your computer
From the webpage:

  • Docker

In the VM

docker pull ummidock/innuca:3.1

Install ReMatCh

What is ReMatCh?

Reads mapping against target sequences, checking mapping and consensus sequences production

In your computer

In the webpage:

  • Clone or download > Copy to clipboard

In the VM

git clone https://github.com/B-UMMI/ReMatCh.git
mv ReMatCh/ ~/NGStools/
echo "export PATH=$HOME/NGStools/ReMatCh"':$PATH' >> ~/.profile

Get ABRicate

What is ABRicate?

Mass screening of contigs for antimicrobial resistance or virulence genes. It comes bundled with seven databases: Resfinder, CARD, ARG-ANNOT, NCBI BARRGD, NCBI, EcOH, PlasmidFinder and VFDB.

In your computer

In UMMI Docker Hub webpage:

  • ummidock/abricate

In the VM

docker pull ummidock/abricate:latest

Get Prokka

What is Prokka?

Prokka is a software tool to annotate bacterial, archaeal and viral genomes quickly and produce standards-compliant output files

In your computer

In UMMI Docker Hub webpage:

  • ummidock/prokka

In the VM

docker pull ummidock/prokka:1.12

Get Roary

What is Roary?

Roary is a high speed stand alone pan genome pipeline, which takes annotated assemblies in GFF3 format (produced by Prokka (Seemann, 2014)) and calculates the pan genome

In your computer

In Sanger Pathogens Docker Hub webpage:

  • sangerpathogens/roary

In the VM

docker pull sangerpathogens/roary:latest

Get Scoary

What is Scoary?

Scoary is designed to take the gene_presence_absence.csv file from Roary as well as a traits file created by the user and calculate the assocations between all genes in the accessory genome and the traits. It reports a list of genes sorted by strength of association per trait.

In the VM

sudo apt-get install python-pip
pip install scoary