-
Notifications
You must be signed in to change notification settings - Fork 10
Installation
BioGraph software is designed to run on modern Linux systems. For a full list of requirements, see System Requirements.
The BioGraph program and classifier model are distributed in separate files.
-
BioGraph-7.0.0.tgz
: the BioGraph program and support libraries, available via GitHub. -
biograph_model-7.0.0.ml
: the variant classifier model, downloadable via https or s3 ats3://spiral-archive/models/biograph_model-7.1.0.ml
The BioGraph program is required for all installations. The classifier model is approximately 7GB, and is required for the qual_classifier
step.
If your system supports Docker, you can run BioGraph directly via Docker Hub:
$ docker run spiralgenetics/biograph
This procedure works on most Internet-connected Linux systems. While BioGraph itself does not require Internet access, dependencies are downloaded automatically at install time. If you are installing on a cluster without direct Internet access, see Installing without Internet access.
BioGraph requires Python 3.6 or later and should be installed in a python virtualenv or venv. You will use this python environment anytime you use biograph
commands.
If the virtualenv
command is available on your system, create and activate a new environment:
$ virtualenv --python=python3.6 bg7
$ . bg7/bin/activate
(bg7)$
If the virtualenv
command is not available on your system, you can create an environment using the built-in venv
python module. It will require updating after creation and activation:
$ python3.6 -mvenv bg7
$ . bg7/bin/activate
(bg7)$ pip install --upgrade pip wheel setuptools
Collecting pip
...
(bg7)$
Finally, install the BioGraph tarball using pip
in the active python environment.
(bg7)$ pip install BioGraph-7.0.0.tgz
Processing BioGraph-7.0.0.tgz
...
(bg7)$
This will install BioGraph and all required python libraries.
On some systems, a few python dependencies may require compilation. If you encounter installation issues, see Compiling Python Dependencies.
The biograph_model-7.0.0.ml
file may be kept anywhere convenient. The path to this file will be provided as an option when running BioGraph commands.
These additional open source tools are also required to run the full BioGraph pipeline. They may be installed in any directory in your PATH.
-
vcf-sort (from VCFtools)
- Recommended: v0.1.16
- bgzip (from samtools)
- tabix (from samtools)
-
bcftools (from samtools)
- Recommended: v1.12
To install these packages on Ubuntu 18.04:
$ sudo apt install -y vcftools tabix bcftools
You should now be able to run the biograph
command:
(bg7)$ biograph
usage: biograph [-h]
biograph v7.0.0 - the BioGraph genome processing pipeline
Pipeline Commands:
full_pipeline Run the full BioGraph single-sample pipeline
reference Build a BioGraph reference from FASTA
create Convert reads to the BioGraph format
discovery Discover variants on a BioGraph vs. a reference
coverage Calculate coverage for VCF entries
qual_classifier Assign quality scores and filter variants
vdb Access the variant database (beta)
Utility Commands:
license Check license status
stats Get basic QC stats from a BioGraph
version Print the BioGraph version and exit
refhash Identify the reference in a VCF, FASTA, SAM, or
BioGraph refdir
For help on any command, use the --help option:
$ biograph full_pipeline --help
For full documentation: https://www.spiralgenetics.com/user-documentation
optional arguments:
-h, --help show this help message and exit
Congratulations, you're ready to go.
If you encounter problems, check the test_*/log.txt
file and contact Spiral support for assistance.
Some systems require additional steps for installation. See the sections below, or contact Spiral support for assistance.
Several supporting Python packages will be installed in addition to BioGraph itself. While we take care to avoid the need to compile code on most systems, some dependencies (htslib
, pysam
, numpy
, pandas
, scipy
) may require compilation.
If you see an error when running the pip install
command, be sure that a working compiler is installed. You will also need the development libraries python3-dev
, liblzma
, libbz2-dev
, and zlib
. On some systems, liblapack-dev
and libblas-dev
may also be required. The following command will install these dependencies on Ubuntu systems:
(bg7)$ sudo apt install -y build-essential python3-dev liblzma-dev zlib1g-dev libbz2-dev liblapack-dev libblas-dev
In some cluster computing environments, direct Internet access is restricted or prohibited from worker nodes. In this case, the BioGraph dependencies can be downloaded on a separate system that has Internet access (such as a laptop) and then installed manually.
Note that the download machine and the worker node must be running the same operating system (for example, Ubuntu 18.04) and have the same architecture (eg. x86_64).
To begin, create a python environment as described above on the machine with Internet access. Be sure to use the same version of Python for this virtualenv (for example, Python 3.6) as will be used on the cluster. Then activate the environment and run the following commands:
(bg7)$ mkdir install_me
(bg7)$ cd install_me
(bg7)$ pip download /path/to/BioGraph-7.0.0.tgz
This will download several tarballs to the current directory.
Next, transfer the install_me/
folder to the cluster. Log into the cluster, make a new python environment, and activate it as described above. Finally, cd
into the install_me
directory and install the packages with the following commands:
(bg7)$ cd install_me
(bg7)$ pip install *
BioGraph and all dependencies are now installed to your python environment.
With a verified working BioGraph installation, you are ready to run your first dataset.
If you have any issues or concerns about your BioGraph installation, don't hesitate to contact Spiral support.
Next: Quick Start