04 Jan 16:40

bkille

1a07d0e

v3.1.3 Latest

Latest

Previously, the -f one-to-one filter plane-sweep filter was applied to all mappings at the same time. In cases where users are mapping multiple query genomes to one or more target sequences with the --skipPrefix # flag, the one-to-one filter would treat all query sequences as part of the same genome/group.

This patch makes it so that the one-to-one plane-sweep filter is applied to each pair of query and reference groups independently, ensuring that -n mappings are retained for each pair. A "group" of sequences is the set of sequences which contain the same prefix up until the last occurrence of the character c, where --skipPrefix c is specified.

Assets 2

30 Nov 16:45

bkille

v3.1.2

cc49b9f

MashMap v3.1.2

#64 - Fixes a bug where the kc tag was incorrect for chained mappings

Assets 2

22 Aug 13:25

bkille

v3.1.1

28ea0b8

MashMap v3.1.1

In order to maintain a default build that requires the same libraries as previous versions, building with htslib is now optional.
To build with htslib, call cmake with -DUSE_HTSLIB=ON
htslib is useful for the --targetList and --targetPrefix options, which allow users to only index specific contigs from a reference fasta file.

Assets 2

21 Aug 20:31

bkille

v3.1.0

5094738

MashMap v3.1.0

When filtering matches, the "score" of a match no longer takes into account the length. Previously, the score for a mapping was len*ANI, meaning that a 1000bp mapping with 100% identity would be tossed out in favor of a 1112bp mapping with 90% identity.
Fixes a rare bug that caused a crash when the very first minmer in the index is a hit.
Fixes bug with --kmerThreshold CLI option which ignored users' argument in favor of 1.
Low complexity segments are tossed out before stage 1 mapping.
Mappings use 32-bit integers to store positions now instead of 64-bit integers. If you need mashmap to work with contigs larger than 2^31, you can pass -DLARGE_CONTIG=1 to CMake when building.
Reads shorter than the block length are now split, instead of being aligned in one piece.
Added --targetPrefix and --targetList CLI options, which allow the users to specify subsets of the reference file to be indexed. Requires htslib!
Added --lowerTriangular CLI option which only computes mappings between sequence i and sequence j if i > j (meant to be used when reference and query files are identical).
Limits the size of the DP filter so that large sketch sizes don't incur a huge setup time.

Assets 2

10 Jul 18:14

bkille

v3.0.6

4f4df5d

MashMap v3.0.6

Changelog:

Uses the chaining algorithm from wfmash
--splitPrefix now performs the filtering on each prefix-group independently.
Does not sketch or winnow k-mers w/ ambiguous nucleotides.
Added kc:f tag for the estimated k-mer complexity, defined as the ratio of the estimated number of distinct k-mers in a segment to the total number of k-mers in a segment (this estimate can be greater than 1.0).
Added a flag --kmerComplexity x to filter out segments with estimated kmer complexity less than x.
Mapping progress now updates for each segment mapped, as opposed to each contig mapped.
Added --reportPercentage option to report ANI as a percentage instead of in [0, 1] range (necessary for use w/ wfmash)
Fixes #54
Does not split sequences smaller than the block length.
Added address sanitizer for Debug build

Assets 2

28 Jun 05:37

bkille

v3.0.5

c6978dd

MashMap v3.0.5

Changelog:

Removed sanity-check filters that were actually dropping desired mappings
Sort query minmers upon recruitment using the heap as opposed to sorting for every stage 1 hit
Add -DPROFILE flag to compile w/ debug symbols and no inline (also removed inline keyword from some functions)
Cast jaccard to float now that it is no longer multiplied by 100.0.

Assets 2

18 May 17:28

bkille

v3.0.4

7ba5173

MashMap v3.0.4

Add --legacy flag for MashMap2 style output
Add -v/--version flag
Output id and jc tags in [0,1] range instead of [0, 100]
Improves stderr header output

Conda package (and MacOS binary) to be added once Bioconda CI is fixed

Assets 3

16 May 16:46

bkille

v3.0.2

85347a7

MashMap v3.0.2

Clarified block-length help string
Fixed bug for block-length filter
Removed some optimization flags

Assets 2

12 Apr 16:18

bkille

v3.0.1

d4cf81c

MashMap v3.0.1

MashMap3 Changelog

Instead of indexing locations of minimizers, we track indexing of windows for which a k-mer is one of the lowest s hashes in the window where s is the sketch size. These k-mers are termed "minmers."
The first-pass filtering stage computes the number of shared minmers for each candidate mapping in linear time. Regions with significantly high counts of shared minmers are passed on to stage 2.
The second stage of filtering, where the minhash score of each mapping in the candidate region is calculated, uses a std::vector to keep track of the rolling minhash score as opposed to the std::map used in MashMap2. The details can be seen in slidingMap.hpp.
While the mapping stage is faster, particularly for lower ANI cutoffs (90% and below), the indexing stage does require a bit more time than before. To avoid spending time recomputing the index, users can save the index via --saveIndex PREFIX, and then reuse it in a later run with --loadIndex PREFIX.
The default parameter for the sketch size depends on the value of the minimum ANI threshold (pi) and the segment length (L). Decreasing the sketch size will decrease runtime in a linear fashion at the cost of increasing the variance in the ANI estimation error.
Frequent seeds are filtered out based on how many minmer-intervals they have as opposed to how many times the kmer actually occurs in the reference. This adds some noise to frequent-kmer filtering, as its possible for a less frequent kmer to have more intervals than a more frequent kmer.
The binomial model is used to estimate ANI from Jaccard instead of the Poisson model.
k-mer size is no longer limited to <=16, as the hash values are 64 bits instead of 32 bits. The default kmer size is now 19.
Numerous interface updates were copied over from wfmash, including a progress meter and usage of the samtools .fai index.
The output of MashMap3 is now in PAF format, with id and jc tags which represent the estimated ANI and the estimated Jaccard similarity, respectively. The jc tag is only present for mappings where chaining is disabled.
There is now an option for significantly denser sketching, --dense

Assets 2

03 Feb 21:32

cjain7

v2.0

ba24e5c

MashMap v2.0

Now generalized for computing approximate local alignments between long DNA sequences. This will be useful for fast genome to genome mapping or split-read mapping of long reads.

Assets 4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Changelog:

MashMap3 Changelog

Releases: marbl/MashMap

v3.1.3

MashMap v3.1.2

MashMap v3.1.1

MashMap v3.1.0

MashMap v3.0.6

Changelog:

MashMap v3.0.5

MashMap v3.0.4

MashMap v3.0.2

MashMap v3.0.1

MashMap3 Changelog

MashMap v2.0