--haplotype rework and metadata loading flag #329

jmcbroome · 2023-02-09T23:33:08Z

This PR addresses two issues.

First, it addresses #303. Typically, only metadata for samples in the users query set is loaded into memory. This was originally implemented to reduce the memory footprint of our approach. However, in cases with -N, -K, and similar, users may want full metadata to be available for any and all samples in their output, including non-query context samples. Accordingly, I have added a flag (without a single letter accompanying it) --load-all-metadata to matUtils extract indicating that all available metadata should be loaded and available for output.

Second, it addresses #326. This is a significant rework of the implementation and output of matUtils summary --haplotype. It is now dynamically computed, significantly reducing runtime, and instead of representing haplotypes as unordered mutational paths, they are now represented as location-state strings in a set (e.g. '56A,60G' means that a haplotype where position 56 is A, position 60 is G, and the rest are reference).

…explicit mutations

jmcbroome added 9 commits February 7, 2023 16:30

change haplotype handling to exclude simple reversions

8ba6068

change handling of haplotypes to be state-based rather than tracking …

0080505

…explicit mutations

remove commented code

705006b

add draft code for loading all sample metadata flag

beb6693

draft code for dynamic haplotype tracking

08ba5be

rework haplotype code significantly

8e04de4

update to haplotype code

444e5dd

cleanup

0b271d9

commit header update

5d56255

yatisht merged commit d90bc9f into yatisht:master Feb 10, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

--haplotype rework and metadata loading flag #329

--haplotype rework and metadata loading flag #329

jmcbroome commented Feb 9, 2023

--haplotype rework and metadata loading flag #329

--haplotype rework and metadata loading flag #329

Conversation

jmcbroome commented Feb 9, 2023