Skip to content

v1.18.0

Compare
Choose a tag to compare
@tomkinsc tomkinsc released this 14 Oct 01:03
· 553 commits to master since this release

New:

  • The Snakemake pipeline can now source database files from S3, GS, or SFTP if given protocol-prefixed paths (s3://, gs://, sftp://) and if the system is preconfigured with credentials.
  • The config.yaml file has been changed to include s3://* paths for pre-built databases, rather than Broad Institute-specific paths (and files listed are live and available for all!)
  • Kraken is now enabled on OSX, though significant RAM is required to use it
  • The reports.py::align_and_plot_coverage and read_utils.py::align_and_fix functions now expose an optional argument, --minScoreToFilter. This adds an option—when using bwa—to calculate an alignment score for each query by summing the scores across the query's alignments, and keep only the queries whose score is at least the value of the specified threshold.
  • sample sheets can now be specified in *.csv.gz format
  • For debugging or more bespoke analysis, temp files can now be kept more easily by setting the VIRAL_NGS_TMP_DIRKEEP environment variable
  • The cd-hit-dup tool has been added as an alternative to mvicuna for removing duplicate reads, via a new CLI function read_utils.py::rmdup_cdhit_bam. Note that this is not currently used in the pipeline by default.
  • The Gap2Seq tool has been added for filling gaps between contigs. It is exposed via the new CLI command: assembly.py::gapfill_gap2seq. Note that this is not currently used in the pipeline by default.
  • The Spades assembler has been added as an alternative to Trinity for de novo assembly. Note that this is not currently used in the pipeline by default.
  • Expose blastn --chunkSize in taxon_filter.deplete_human.

Changed:

  • metagenomics rules in the Snakemake pipeline now break out kraken files as separate targets
  • improvements to speed of automated tests
  • The source and binaries for mvicuna and v-phaser2 have been removed from this repository since they now reside in their own repositories
  • viral-ngs is no longer tested against or distributed for Python 3.4, from this release forward. This should not impact users since the package is typically installed in an isolated conda environment with Python 3.5 or 2.7.
  • The Snakemake rules and cluster-submitter have been updated to reflect changes to the UGER cluster system at the Broad Institute, which now requires that -l h_rt hh:mm:ss be passed to schedule max runtime for each job
  • performance improvements to lastal filtering
  • lastal database is now built automatically if supplied pre-built
  • SPAdes wrapper more resilient to empty fastq inputs
  • Reimplement samtools.filterByCigarString using pysam instead of samtools
  • Kraken on OSX now exists on broad-viral: enable it in OSX git hooks and turn on all tests
  • Remove lastal optional outputs from taxon_filter.deplete_human

Fixed:

  • In the Snakemake pipeline, code that reads sample sheets and barcode files is now more tolerant of different formats, including files formatted with Windows-style newlines (\r\n for Windows vs. \n for Linux/Unix/macOS)
  • fixed handling of empty subtrees when importing *.yaml files within *.yaml config files (for config includes/composition)
  • fixed other edge cases related to config imports

Upgraded:

  • last 719 -> 876
  • Update samtools to 1.5
  • Update pysam to 0.12.0.1