Skip to content

Commit

Permalink
Update build to work with ViPR data
Browse files Browse the repository at this point in the history
  • Loading branch information
trvrb committed Aug 16, 2019
1 parent ab70cbd commit 081f9f8
Show file tree
Hide file tree
Showing 4 changed files with 48 additions and 15 deletions.
5 changes: 5 additions & 0 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,11 @@ example sequences into the `data/` directory like so:
mkdir -p data/
cp example_data/dengue* data/

## AWS

With access to AWS, this can be more quickly run as:

nextstrain build --aws-batch --aws-batch-cpus 4 --aws-batch-memory 7200 . --jobs 4

[Nextstrain]: https://nextstrain.org
[fauna]: https://github.com/nextstrain/fauna
Expand Down
30 changes: 15 additions & 15 deletions Snakefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,23 @@ rule files:

files = rules.files.params

def download_serotype_integer(w):
serotype_integer = {
def download_serotype(w):
serotype = {
'all': 'all',
'denv1': '1',
'denv2': '2',
'denv3': '3',
'denv4': '4'
'denv1': 'Dengue_virus_1',
'denv2': 'Dengue_virus_2',
'denv3': 'Dengue_virus_3',
'denv4': 'Dengue_virus_4'
}
return serotype_integer[w.serotype]
return serotype[w.serotype]

def filter_sequences_per_group(w):
sequences_per_group = {
'all': '10',
'denv1': '30',
'denv2': '30',
'denv3': '30',
'denv4': '30'
'denv1': '36',
'denv2': '36',
'denv3': '36',
'denv4': '36'
}
return sequences_per_group[w.serotype]

Expand Down Expand Up @@ -59,7 +59,7 @@ rule download:
sequences = "data/dengue_{serotype}.fasta"
params:
fasta_fields = "strain virus accession collection_date region country division location source locus authors url title journal puburl",
serotype_integer = download_serotype_integer
download_serotype = download_serotype
run:
if wildcards.serotype == 'all':
shell("""
Expand All @@ -76,7 +76,7 @@ rule download:
--database vdb \
--virus dengue \
--fasta_fields {params.fasta_fields} \
--select serotype:{params.serotype_integer} \
--select serotype:{params.download_serotype} \
--path $(dirname {output.sequences}) \
--fstem $(basename {output.sequences} .fasta)
""")
Expand Down Expand Up @@ -150,7 +150,7 @@ rule align:
--output {output.alignment} \
--fill-gaps \
--remove-reference \
--nthreads auto
--nthreads 1
"""

rule tree:
Expand All @@ -164,7 +164,7 @@ rule tree:
augur tree \
--alignment {input.alignment} \
--output {output.tree} \
--nthreads auto
--nthreads 1
"""

rule refine:
Expand Down
6 changes: 6 additions & 0 deletions config/clades_genotypes.tsv
Original file line number Diff line number Diff line change
Expand Up @@ -10,10 +10,16 @@ DENV1/IV E 339 S
DENV1/IV M 72 E
DENV1/IV E 88 T
DENV1/V NS1 324 R
DENV1/V NS2A 142 P
DENV1/V NS3 185 K
DENV1/V NS5 834 E
DENV2/AM E 71 D
DENV2/AM E 81 T
DENV2/AM E 129 I
DENV2/AM NS1 21 V
DENV2/AM NS1 73 S
DENV2/AM NS1 99 V
DENV2/AM NS1 170 R
DENV2/AA E 491 A
DENV2/AA M 15 G
DENV2/AA M 39 I
Expand Down
22 changes: 22 additions & 0 deletions config/dropped_strains.txt
Original file line number Diff line number Diff line change
Expand Up @@ -26,3 +26,25 @@ DENV2/TRINIDAD_AND_TOBAGO/NA/1953
DENV4/MALAYSIA/P215/1975
DENV4/MALAYSIA/P514/1975
DENV4/MALAYSIA/P731120/1973
D2Sab2015 # miscategorized
QML22 # miscategorized
DAK_Ar_A1247 # sylvatic
Dak_Ar_2039 # sylvatic
Dak_Ar_578 # sylvatic
DAK_Ar_510 # sylvatic
PM33974 # sylvatic
Dak_Ar_A2022 # sylvatic
Dak_Ar_141069 # sylvatic
Dak_Ar_141070 # sylvatic
Dak_Ar_D75505 # sylvatic
Dak_HD_10674 # sylvatic
Dak_Ar_D20761 # sylvatic
IBH11664 # sylvatic
IBH11208 # sylvatic
IBH11234 # sylvatic
P8_1407 # sylvatic
P75_514 # sylvatic
P73_1120 # sylvatic
P75_215 # sylvatic
DKD811 # sylvatic
ZS01/01 # metadata issue

0 comments on commit 081f9f8

Please sign in to comment.