All notable changes to this project will be documented in this file.
The format is based on Keep a Changelog, and this project adheres to Semantic Versioning.
2.1.0 (2024-08-19)
2.0.0 (2024-05-03)
- paired reads require --output once for each file
- paired reads require --output once for each file (1427a0b)
1.0.0 (2024-04-29)
- move fastq functionality to
reads
subcommand
- add
cite
command to get citation (db17612) - add subcommand
aln
to subsample alignments (b92979a) - move fastq functionality to
reads
subcommand (f48d47b)
- deal with chromosomes with no alignments (14aa15e)
0.8.0 (2024-01-03)
- add logging message with coverage of input before downsampling (79445fc)
- support ztsd (cfa50f8)
- use default compression level for compression output type (cfa50f8)
- update logging so colour not sent to file (bc62c3f)
- Install script and support for more binary triple targets
- Updated needletail dependecy due to dependency deprecation
- Fraction (
--frac
) and number (--num
) options. This allows users to replicate the functionality ofseqtk sample
[#34]
- Warning if the actual coverage of the file(s) is less than the requested coverage [#36]
- JOSS manuscript
- Use
rasusa
as the entry command for docker container [#35]
--bases
option to allow for manually setting the target number of bases to keep [#30]--genome-size
can now take a FASTA/Q index file and the sum of all reference sequences will be used as the genome size [#31]
- Support for LZMA, Bzip, and Gzip output compression (thanks to
niffler
). This is either inferred from the file extension or manually via the-O
option. - Option to specify the compression level for the output via
-l
- Use a
Vec<bool>
instead ofHashSet
to store the indices of reads to keep. This gives a nice little speedup (see #28), A big thank you to @natir for this.
- Restore compression of output files [#27]
- I had stupidly forgetten to merge the fix for #22 onto master 🤦
- Releasing cross-compiled binaries didn't work for version 0.4.0
- Docker image is now correctly built
- Switch from using
snafu
andfailure
for error handling toanyhow
andthiserror
. Based on the procedure outlined in this excellent blog post. - Switched fasta/q parsing to use needletail instead of rust-bio. See benchmark for improvement in runtimes.
- Changed the way Illumina paired reads are subsampled. Previously, there was an assumption made that the reads of a pair were both the same length as the R1 read. We are now more careful and look at each read's length individually [#22]
- Moved container hosting to quay.io
Version 0.3.0 may give different results to previous versions. If so, the differences
will likely be a handful of extra reads (possibly none). The reason for this is
--coverage
is now treated as a float. Previously we immediately round coverage down to
the nearest integer. As the number of reads to keep is based on the target total number
of bases, which is coverage * genome size. So if coverage is 10.7 and genome size is
100, previously our target number of bases would have been 1000, whereas now, it would
be 1070.
--coverage
is now treated as af32
instead of being converted immediately to an integer #19.- Updated
rust-bio
to version 0.31.0. This meansrasusa
now handles wrapped fastq files. - Preallocate fastx records instead of using iterator. Gives marginal speedup.
- Added
bash
to the docker image b47a8b75943098bdd845b7758cf2eab01ef5a3d8
- Support paired Illumina #15