This is the Nextstrain build for Ebola, visible at nextstrain.org/ebola.
The build encompasses fetching data, preparing it for analysis, doing quality control, performing analyses, and saving the results in a format suitable for visualization (with auspice). This involves running components of Nextstrain such as fauna and augur.
All Ebola-specific steps and functionality for the Nextstrain pipeline should be housed in this repository.
Input sequences and metadata can be retrieved from data.nextstrain.org
Note that these data are generously shared by many labs around the world. If you analyze and plan to publish using these data, please contact these labs first.
Within the analysis pipeline, these data are fetched from data.nextstrain.org and written to data/
with:
nextstrain build . data/sequences.fasta data/metadata.tsv
Run pipeline to produce "overview" tree for /ebola
with:
nextstrain build .
View results with:
nextstrain view auspice/
Configuration takes place entirely with the Snakefile
. This can be read top-to-bottom, each rule
specifies its file inputs and output and also its parameters. There is little redirection and each
rule should be able to be reasoned with on its own.
This build starts by pulling sequences from our live fauna database (a RethinkDB instance). This
requires environment variables RETHINK_HOST
and RETHINK_AUTH_KEY
to be set.
If you don't have access to our database, you can run the build using the example data provided in
this repository. Before running the build, copy the example sequences into the data/
directory
like so:
mkdir -p data/ cp example_data/ebola.fasta data/