Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
[MRG] add manifests to support fast
Index.select(...)
and lazy load…
…ing (#1590) * various cleanups of sourmash_args * cleanup flakes errors * clean up sourmash.sig submodule * initial picklist implementation * integrate picklists into sourmash sig extract * basic tests for picklist functionality * track found etc * add picklists to selectors * split pickfile out a little bit * split column_type out of SignaturePicklist a bit * picklist tests for .signatures() methods on Index classes * split pickfile out a little bit * split column_type out of SignaturePicklist a bit * test 'Index.find' on picklists for SBTs and LCAs * factor out picklist checks to 'passes_all_picklists' fn * support special picklist interactions with zipfile collections * special case md5 prefixes, for prefetch * try out manifests * hacky but functional manifest support * add missing manifest CLI file * build out a manifest class a bit * provide 'select' more generically on manifests * get started adding manifests to MultiIndex * work through manifests for MultiIndex * update comment about picklist.found * more comment * try making manifests obligatory for MultiIndex * create LoadedCollection to replace MultiIndex non-lazy loading * cleanup/simplification of LoadedCollection * fix all the tests * fix test names for new LoadedCollection * remove MultiIndex * more cleanup * misc cleanup * shift signature metadata matching from manifests over to picklist * cleanup and simplification of ZipFile stuff * more cleanup and docs * create LazyMultiIndex * move manifest stuff into manifest class * add manifests to SBTs * CSV output function * done, I think? * fix tests * update comments, constructor, etc. * fix tests :) * more picklist tests * verify output * add --picklist-require-all &c * documentation * test with --md5 selector * cover untested code with tests * trap errors and be nice to users * remove comment * fix tests for new SignaturePicklist * move picklist.py from sourmash.sig into sourmash * move picklist reporting into sourmash_args * fix space * add picklist args throughout, eek. * add picklists and tests for search, gather, index * add picklists to prefetch * add picklists to sourmash compare * add picklists to lca index * block multiple picklists on SBTs and LCAs, for now * add picklist test that checks indexing-and-then-search == index * add a test for using prefetch CSV as picklist * remove debugging print * add docs * fix coltypes * remove order dependence from test * only match picklist at end of 'select' * further attempt to fix test * remove @ctb comments * cleanup of comments etc. * fix test for manifests * add manifest versions * add a test for sig manifest * add manifest tests * add save/load test * rename matches_siginfo to matches_manifest_row * add docstring * reverse order of adding to seen set * fix header writing * change LoadedCollection back over to MultiIndex; remove LazyMultiIndex * revert collection to multiindex * remove print * move manifest stuff to manifest.py * add manifests to default zip collection output * rename signatures_with_internal to _signatures_with_internal * update sig manifest to error when manifests cannot be generated * add sig manifest tests for other file types * add use_manifest fixture, refactor manifest loading * more manifest testing for zipfiles * check compatibility in MinHash.intersection_and_union * refactor zipfile select * more refactor zipfile select * don't test manifest content * update test files to have manifest, update tests * remove print statements * add test for multiple selects * update docstring * fix tests for a CLEAN test-data/prot/ directory * [MRG] compress manifests in .zip/.sbt.zip files, and set default file permissions in zip (#1633) * w/zip, compress manifests, and set good default file permissions * [MRG] alias `--nucleotide`, `--no-nucleotide` for moltype args. (#1632) * add --nucleotide to moltype args * test --nucleotide; update bad moltypes error msg Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com> Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
- Loading branch information