-
Notifications
You must be signed in to change notification settings - Fork 34
Installing the workflows with pip
This page contains information on how to install the workflows and dependencies with pip packages for the wmgx workflow when bypassing the strain profiling tasks. Please see the StrainPhlAn documentation on how to install the tool StrainPhlan which is used for workflows that require strain profiling.
First, install python3.6 or 3.7 if needed. These are the two python versions we use for development and testing. Please note: These are the only two commands for the installation that would require root. In this section also install the required compilers to install the tool dependencies plus install pip if needed to install the bioBakery python packages. Please change yum to your package manager which might be different depending on your operating system.
$ sudo yum update
$ sudo yum install python36 python36-devel python36-pip gcc gcc-c++
Next, start a virtualenv with python 3.6. This will contain all of your packages and allow you to keep those packages and dependencies in a static environment so other installations of python packages and dependencies on your system have less of a risk of affecting your workflow's install environment.
$ python3 -m venv install_workflows
$ source install_workflows/bin/activate
Now install the latest Kneaddata from source to also install dependencies (eg bowtie2, trf, trimmomatic). Installing bioBakery tools from source will install all the required dependencies. These do not always get installed if your default pip settings are to install packages from wheels. Please note some of these dependencies are compiled during the installation and so require the compilers listed above plus at least 3Gb of available memory.
$ pip install kneaddata==0.10.0 --no-binary :all:
Next install HUMAnN from source to get dependencies (eg bowtie2, blast, diamond).
$ pip install humann==3.0.0.a.4 --no-binary :all:
Now install MetaPhlAn plus its database (set nproc to total cores to use to build database). Building the database will take some time (depending on the number of cores). The database will be installed in the lib folder containing the MetaPhlAn software. The database is approximately 2Gb.
$ pip install metaphlan==3.0.7
$ metaphlan --install --nproc 2
Now test the HUMAnN and MetaPhlAn installs. Biom-format is optional for these tools but is needed for some HUMAnN functional tests (cython is required for the biom-format install).
$ pip install cython==0.29.17
$ pip install biom-format==2.1.8-1
$ humann_test --run-all-tests
Now install anadama and the bioBakery workflows software.
$ pip install anadama2==0.8.0
$ pip install biobakery_workflows==3.0.0a7
Finally install all the workflows databases. These files are large so downloading will take some time.
$ biobakery_workflows_databases --install wmgx
Prior to a full run, test the install with a demo run. The full instructions can be found in the bioBakery workflow tutorial.
First download the small demo input files.
$ mkdir input_test_run
$ cd input_test_run
$ wget https://github.com/biobakery/biobakery_workflows/raw/master/examples/tutorial/input/HD32R1_subsample.fastq.gz
$ wget https://github.com/biobakery/biobakery_workflows/raw/master/examples/tutorial/input/HD42R4_subsample.fastq.gz
$ cd ../
Finally run the workflows replacing local jobs and threads based on the total cores available.
$ biobakery_workflows wmgx --input input_test_run --output output_test_run --bypass-strain-profiling --local-jobs 2 --threads 8