Skip to content

Commit

Permalink
screen usage and readme update
Browse files Browse the repository at this point in the history
  • Loading branch information
Dmitry-Antipov committed Sep 17, 2024
1 parent 2708d29 commit f73c4ac
Show file tree
Hide file tree
Showing 2 changed files with 4 additions and 7 deletions.
3 changes: 1 addition & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -183,8 +183,7 @@ You can pass through snakemake options to restrict CPU/memory/cluster resources

### Filtering common contaminants:

Verkko has the ability to filter common contaminants from an assembly using the `--screen` option. For human samples, you can specify `--screen human` which will automatically filter the mitochonrdia, rDNA, and EBV sequences. For other samples you can specify an arbitrary number of targets using `--screen exampleN exampleN.fasta`. For each contaminant, verkko will remove all sequences matching the target from the main assembly output. It will also identify a 'cannonical' reprentative by coverage and circularize it to remove self-similarity at the start/end.

Verkko has the ability to filter common contaminants from an assembly using the `--screen` option. You can specify an arbitrary number of targets using multiple `--screen <contaminant_N_name> <contaminant_N_sequence.fasta>` commands. For each contaminant, verkko will remove all sequences matching the target from the main assembly output. It will also identify a ‘canonical’ representative by coverage and circularize it to remove self-similarity at the start/end. For typical contaminants of human assemblies we have special option `--screen-human-contaminants` which requires no parameters and is a shortcut for `--screen rDNA rdna.fasta --screen mitochondria mito.fasta --screen EBV ebv.fasta`.
## Outputs:
The final assembly result is under `asm/assembly.fasta`. The final graph (in homopolymer-compressed space) is under `asm/assembly.homopolymer-compressed.gfa` along with coverage files in `asm/assembly*csv`. There is also an `asm/assembly.scfmap` file which translates the final sequence name in `assembly.fasta` to graph nodes. You can find intermediate graphs and coverage files under `asm/*/unitig-*gfa` and `asm/*/unitig-*csv`.

Expand Down
8 changes: 3 additions & 5 deletions src/verkko.sh
Original file line number Diff line number Diff line change
Expand Up @@ -947,11 +947,9 @@ if [ "x$help" = "xhelp" -o "x$errors" != "x" ] ; then
echo " --uneven-depth Disable coverage-based heuristics in homozygous nodes detection for Hi-C/PoreC phasing."
echo " --haplo-divergence Estimation on maximum divergence between haplotypes, is used only with Hi-C/PoreC data. Should be increased for species with divergence significantly higher than in human. Default: 0.05, min 0, max 0.2"
echo ""
echo " --screen <option> Identify common contaminants and remove from the assembly, saving 1 (circularized) exemplar."
echo " For human, '--screen human' will attempt to remove rDNA, mitochondria, and EBV."
echo " Arbitrary contaminants are supported by supplying a name and fasta:"
echo " '--screen contaminant /full/path/to/contaminant.fasta'"
echo " Multiple screen commands are allowed and are additive."
echo " --screen <contaminant_name> </full/path/to/contaminant.fasta>"
echo " Screens contaminant from provided file. Multiple screen commands are allowed and are additive."
echo " To screen typical contaminants of human assemblies one can use --screen-human-contaminants option with no parameters, which removes rDNA, mitochondria and EBV."
echo ""
echo " --paths <gaf paths> No assembly, generate consensus given paths and an existing assembly."
echo " The gaf file must be formatted as follows: 'name >utig4-1<utig4-2 HAPLOTYPE1'. One per line."
Expand Down

0 comments on commit f73c4ac

Please sign in to comment.