Skip to content

Latest commit

 

History

History
212 lines (122 loc) · 6.79 KB

CHANGELOG.md

File metadata and controls

212 lines (122 loc) · 6.79 KB

Changelog

All notable changes to this project will be documented in this file.

The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.

2.1.0 (2024-08-19)

Features

  • [aln] add program (@PG) entry to header (0123e54)
  • log seed used when --seed not passed (6e1f37d)

2.0.0 (2024-05-03)

⚠ BREAKING CHANGES

  • paired reads require --output once for each file

Bug Fixes

  • paired reads require --output once for each file (1427a0b)

1.0.0 (2024-04-29)

⚠ BREAKING CHANGES

  • move fastq functionality to reads subcommand

Features

  • add cite command to get citation (db17612)
  • add subcommand aln to subsample alignments (b92979a)
  • move fastq functionality to reads subcommand (f48d47b)

Bug Fixes

  • deal with chromosomes with no alignments (14aa15e)

0.8.0 (2024-01-03)

Features

  • add logging message with coverage of input before downsampling (79445fc)
  • support ztsd (cfa50f8)
  • use default compression level for compression output type (cfa50f8)

Bug Fixes

  • update logging so colour not sent to file (bc62c3f)

Added

  • Install script and support for more binary triple targets

Changed

  • Updated needletail dependecy due to dependency deprecation

Added

  • Fraction (--frac) and number (--num) options. This allows users to replicate the functionality of seqtk sample [#34]

Added

  • Warning if the actual coverage of the file(s) is less than the requested coverage [#36]
  • JOSS manuscript

Changed

  • Use rasusa as the entry command for docker container [#35]

Addedd

  • --bases option to allow for manually setting the target number of bases to keep [#30]
  • --genome-size can now take a FASTA/Q index file and the sum of all reference sequences will be used as the genome size [#31]

Added

  • Support for LZMA, Bzip, and Gzip output compression (thanks to niffler). This is either inferred from the file extension or manually via the -O option.
  • Option to specify the compression level for the output via -l

Changed

  • Use a Vec<bool> instead of HashSet to store the indices of reads to keep. This gives a nice little speedup (see #28), A big thank you to @natir for this.

Fixed

  • Restore compression of output files [#27]

Fixed

  • I had stupidly forgetten to merge the fix for #22 onto master 🤦

Fixes

  • Releasing cross-compiled binaries didn't work for version 0.4.0
  • Docker image is now correctly built

Changed

  • Switch from using snafu and failure for error handling to anyhow and thiserror. Based on the procedure outlined in this excellent blog post.
  • Switched fasta/q parsing to use needletail instead of rust-bio. See benchmark for improvement in runtimes.
  • Changed the way Illumina paired reads are subsampled. Previously, there was an assumption made that the reads of a pair were both the same length as the R1 read. We are now more careful and look at each read's length individually [#22]
  • Moved container hosting to quay.io

Version 0.3.0 may give different results to previous versions. If so, the differences will likely be a handful of extra reads (possibly none). The reason for this is --coverage is now treated as a float. Previously we immediately round coverage down to the nearest integer. As the number of reads to keep is based on the target total number of bases, which is coverage * genome size. So if coverage is 10.7 and genome size is 100, previously our target number of bases would have been 1000, whereas now, it would be 1070.

Changed

  • --coverage is now treated as a f32 instead of being converted immediately to an integer #19.
  • Updated rust-bio to version 0.31.0. This means rasusa now handles wrapped fastq files.
  • Preallocate fastx records instead of using iterator. Gives marginal speedup.
  • Added bash to the docker image b47a8b75943098bdd845b7758cf2eab01ef5a3d8

Added

  • Support paired Illumina #15