Skip to content

Commit

Permalink
ignore md line length rule
Browse files Browse the repository at this point in the history
  • Loading branch information
ktmeaton committed May 13, 2020
1 parent 27a3a6b commit a758dda
Show file tree
Hide file tree
Showing 2 changed files with 24 additions and 22 deletions.
1 change: 1 addition & 0 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -24,6 +24,7 @@ repos:
rev: master
hooks:
- id: markdownlint
args: ['--help']
name: markdownlint
language: node
types: [markdown]
Expand Down
45 changes: 23 additions & 22 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ conda env create -f phylo-env.yaml --name phylo-env
conda activate phylo-env
```

## Reproduce from previously generated database
## Run full pipeline to reproduce previous analysis

```bash
nextflow run pipeline.nf \
Expand All @@ -30,49 +30,50 @@ nextflow run pipeline.nf \

## Step By Step (From Scratch)

### Build, Update, Join NCBImeta database
### Build NCBImeta metadata database

```bash
nextflow run pipeline.nf \
--ncbimeta_create ncbimeta.yaml \
--outdir results \
--ncbimeta_update ncbimeta.yaml \
--skip_assembly_download \
--skip_ncbimeta_update \
--skip_reference_download
```

### Customize and Curate the Annotations

1. Create a metadata TSV file with just the metadata columns of interest (ie. for NextStrain visualization)
Curate metadata with a DB Browser (SQLite), example:
* ex. Add "REMOVE: Not Yersinia pestis" to the column BioSampleComment.
* ex. Add collection data, geographic location, host etc. from literature.

### Update and Join Database Tables

```bash
scripts/sqlite_NextStrain_tsv.py \
--database test/ncbimeta_db/update/latest/output/database/yersinia_pestis_db.sqlite \
--query "SELECT BioSampleAccession,AssemblyFTPGenbank,SRARunAccession,BioSampleStrain,BioSampleCollectionDate,BioSampleHost,BioSampleGeographicLocation,BioSampleBiovar,PubmedArticleTitle,PubmedAuthorsLastName,AssemblyContigCount,AssemblyTotalLength,NucleotideGenes,NucleotideGenesTotal,NucleotidePseudoGenes,NucleotidePseudoGenesTotal,NucleotiderRNAs,AssemblySubmissionDate,SRARunPublishDate,BioSampleComment FROM Master" \
--no-data-char ? \
--output ncbimeta_default_annot.txt
nextflow run pipeline.nf \
--ncbimeta_update ncbimeta.yaml \
--outdir results \
--skip_sqlite_import \
--skip_reference_download
```

2. Curate/Add metadata with a text-editor, example:
Add "REMOVE: Not Yersinia pestis" to the BioSampleComment column to any rows that are the wrong organism.
Edit the collection data, geographic location, host etc. based on associated publication.
### Export metadata for downstream visualization

3. Replace ? with empty "" for NCBImeta annotation script
NextStrain metadata file preparation

```bash
sed 's/?//g' ncbimeta_default_annot.txt > ncbimeta_annot.txt
scripts/sqlite_NextStrain_tsv.py \
--database test/ncbimeta_db/update/latest/output/database/yersinia_pestis_db.sqlite \
--query "SELECT BioSampleAccession,AssemblyFTPGenbank,SRARunAccession,BioSampleStrain,BioSampleCollectionDate,BioSampleHost,BioSampleGeographicLocation,BioSampleBiovar,PubmedArticleTitle,PubmedAuthorsLastName,AssemblyContigCount,AssemblyTotalLength,NucleotideGenes,NucleotideGenesTotal,NucleotidePseudoGenes,NucleotidePseudoGenesTotal,NucleotiderRNAs,AssemblySubmissionDate,SRARunPublishDate,BioSampleComment FROM Master" \
--no-data-char ? \
--output nextstrain_annot.txt
```

### Update Database With Annotations

Remember that this drops/deletes the Master tables every time it's rerun:

### Run the sqlite import command to see what samples will be run
```bash
nextflow run pipeline.nf \
--ncbimeta_update ncbimeta.yaml \
--ncbimeta_annot ncbimeta_annot.txt \
--sqlite results/ncbimeta_db/update/latest/output/database/yersinia_pestis_db.sqlite \
--outdir results \
--skip_sqlite_import \
--skip_assembly_download \
--skip_reference_download \
-resume
```
Expand Down

0 comments on commit a758dda

Please sign in to comment.