Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

pull data from https instead of fauna #3

Merged
merged 1 commit into from
Apr 1, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
17 changes: 15 additions & 2 deletions .travis.yml
Original file line number Diff line number Diff line change
Expand Up @@ -7,11 +7,24 @@ python:
before_install:
- python3 -m pip install --upgrade pip setuptools wheel
install:
- pip3 install git+https://github.com/nextstrain/cli
# https://docs.conda.io/projects/conda/en/latest/user-guide/tasks/use-conda-with-travis-ci.html#the-travis-yml-file
- wget https://repo.continuum.io/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh;
- bash miniconda.sh -b -p $HOME/miniconda
- source "$HOME/miniconda/etc/profile.d/conda.sh"
- hash -r
- conda config --set always_yes yes --set changeps1 no
- conda update -q conda
# Useful for debugging any issues with conda
- conda info -a
# Install nextstrain cli
- conda install -n base -c conda-forge mamba --yes
- conda activate base
- mamba create -n nextstrain -c conda-forge -c bioconda nextstrain-cli augur auspice nextalign snakemake git --yes
- conda activate nextstrain
- nextstrain version
- nextstrain check-setup
- nextstrain update
Comment on lines -10 to 26
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This diff seems like a step in the wrong direction to me. What was a single line turned into 14. :-/ There's also a lot of unnecessary stuff installed (since Docker is supported and what's used here), like augur, auspice, nextalign, etc. This feels like it was copy/pasted from elsewhere (where maybe it was all necessary, maybe not) without adjusting it for the new context, which is a recipe for degrading the codebase quality over time.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

From a "code smell" standpoint I agree with you @tsibley. The current expansion was a workaround until pip3 install was fixed for .travis testing.

See nextstrain/zika#14 for more details. Open other suggestions.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggestions:

  1. Move to GitHub Actions where we're using conda-incubator/setup-miniconda@v2 in repos like augur
  2. Only do mamba install nextstrain-cli
  3. Make it more clear that Docker is being used. Maybe nextstrain check-setup --set-default? Unless there is a way to do something like nextstrain --set-runtime docker

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like option 1, to move toward a consistent CI across repos :)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah, I didn't see nextstrain/zika#14. That's useful context.

I don't care too much about specifically conda/mamba vs. pip and Travis CI vs. GitHub Actions here, but good to be consistent when it makes sense (e.g. prefer GitHub Actions over Travis CI) and trim things down to minimal necessary steps.

Specifically here, I would wonder why did pip install start failing? Does it work now (i.e. was it transient)? It should work, and it's the simplest step (can be installed from latest PyPI release, doesn't have to be git and probably shouldn't be).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make it more clear that Docker is being used

I think adding the --docker flag to the nextstrain build invocation would suffice ± a comment.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lol, I think we were both taxing .travis test at the same time.

As long as it's working now, I'm happy

script:
- mkdir -p data/
- cp -v example_data/measles.fasta data/
- cp -v example_data/* data/.
- nextstrain build .
11 changes: 6 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@ This is the [Nextstrain](https://nextstrain.org) build for measles virus, visibl
The build encompasses fetching data, preparing it for analysis, doing quality
control, performing analyses, and saving the results in a format suitable for
visualization (with [auspice][]). This involves running components of
Nextstrain such as [fauna][] and [augur][].
Nextstrain such as [augur][].

All measles-specific steps and functionality for the Nextstrain pipeline should be
housed in this repository.
Expand Down Expand Up @@ -42,22 +42,23 @@ Configuration takes place entirely with the `Snakefile`. This can be read top-to
specifies its file inputs and output and also its parameters. There is little redirection and each
rule should be able to be reasoned with on its own.


<!--
### fauna / RethinkDB credentials

This build starts by pulling sequences from our live [fauna][] database (a RethinkDB instance). This
requires environment variables `RETHINK_HOST` and `RETHINK_AUTH_KEY` to be set.
-->

If you don't have access to our database, you can run the build using the
If you don't have access to our https endpoints, you can run the build using the
example data provided in this repository. Before running the build, copy the
example sequences into the `data/` directory like so:

mkdir -p data/
cp example_data/measles.fasta data/
cp example_data/* data/.


[Nextstrain]: https://nextstrain.org
[fauna]: https://github.com/nextstrain/fauna
<!-- [fauna]: https://github.com/nextstrain/fauna -->
[augur]: https://github.com/nextstrain/augur
[auspice]: https://github.com/nextstrain/auspice
[snakemake cli]: https://snakemake.readthedocs.io/en/stable/executable.html#all-options
Expand Down
47 changes: 19 additions & 28 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,40 +13,31 @@ rule files:
files = rules.files.params

rule download:
message: "Downloading sequences from fauna"
message: "Downloading sequences and metadata from data.nextstrain.org"
output:
sequences = "data/measles.fasta"
sequences = "data/sequences.fasta.xz",
metadata = "data/metadata.tsv.gz"
params:
fasta_fields = "strain virus accession collection_date region country division location source locus authors url title journal puburl"
sequences_url = "https://data.nextstrain.org/files/measles/sequences.fasta.xz",
metadata_url = "https://data.nextstrain.org/files/measles/metadata.tsv.gz"
shell:
"""
python3 ../fauna/vdb/download.py \
--database vdb \
--virus measles \
--fasta_fields {params.fasta_fields} \
--resolve_method choose_genbank \
--path $(dirname {output.sequences}) \
--fstem $(basename {output.sequences} .fasta)
curl -fsSL --compressed {params.sequences_url:q} --output {output.sequences}
curl -fsSL --compressed {params.metadata_url:q} --output {output.metadata}
"""

rule parse:
message: "Parsing fasta into sequences and metadata"
rule decompress:
message: "Decompressing sequences and metadata"
input:
sequences = rules.download.output.sequences
sequences = "data/sequences.fasta.xz",
metadata = "data/metadata.tsv.gz"
output:
sequences = "results/sequences.fasta",
metadata = "results/metadata.tsv"
params:
fasta_fields = "strain virus accession date region country division city db segment authors url title journal paper_url",
prettify_fields = "region country division city"
sequences = "data/sequences.fasta",
metadata = "data/metadata.tsv"
shell:
"""
augur parse \
--sequences {input.sequences} \
--output-sequences {output.sequences} \
--output-metadata {output.metadata} \
--fields {params.fasta_fields} \
--prettify-fields {params.prettify_fields}
gzip --decompress --keep {input.metadata}
xz --decompress --keep {input.sequences}
"""

rule filter:
Expand All @@ -59,8 +50,8 @@ rule filter:
- minimum genome length of {params.min_length}
"""
input:
sequences = rules.parse.output.sequences,
metadata = rules.parse.output.metadata,
sequences = rules.decompress.output.sequences,
metadata = rules.decompress.output.metadata,
exclude = files.dropped_strains
output:
sequences = "results/filtered.fasta"
Expand Down Expand Up @@ -128,7 +119,7 @@ rule refine:
input:
tree = rules.tree.output.tree,
alignment = rules.align.output,
metadata = rules.parse.output.metadata
metadata = rules.decompress.output.metadata
output:
tree = "results/tree.nwk",
node_data = "results/branch_lengths.json"
Expand Down Expand Up @@ -190,7 +181,7 @@ rule export:
message: "Exporting data files for for auspice"
input:
tree = rules.refine.output.tree,
metadata = rules.parse.output.metadata,
metadata = rules.decompress.output.metadata,
branch_lengths = rules.refine.output.node_data,
nt_muts = rules.ancestral.output.node_data,
aa_muts = rules.translate.output.node_data,
Expand Down
Binary file added example_data/metadata.tsv.gz
Binary file not shown.
Binary file added example_data/sequences.fasta.xz
Binary file not shown.