Skip to content

Commit

Permalink
[MRG] add manifests to support fast Index.select(...) and lazy load…
Browse files Browse the repository at this point in the history
…ing (#1590)

* various cleanups of sourmash_args

* cleanup flakes errors

* clean up sourmash.sig submodule

* initial picklist implementation

* integrate picklists into sourmash sig extract

* basic tests for picklist functionality

* track found etc

* add picklists to selectors

* split pickfile out a little bit

* split column_type out of SignaturePicklist a bit

* picklist tests for .signatures() methods on Index classes

* split pickfile out a little bit

* split column_type out of SignaturePicklist a bit

* test 'Index.find' on picklists for SBTs and LCAs

* factor out picklist checks to 'passes_all_picklists' fn

* support special picklist interactions with zipfile collections

* special case md5 prefixes, for prefetch

* try out manifests

* hacky but functional manifest support

* add missing manifest CLI file

* build out a manifest class a bit

* provide 'select' more generically on manifests

* get started adding manifests to MultiIndex

* work through manifests for MultiIndex

* update comment about picklist.found

* more comment

* try making manifests obligatory for MultiIndex

* create LoadedCollection to replace MultiIndex non-lazy loading

* cleanup/simplification of LoadedCollection

* fix all the tests

* fix test names for new LoadedCollection

* remove MultiIndex

* more cleanup

* misc cleanup

* shift signature metadata matching from manifests over to picklist

* cleanup and simplification of ZipFile stuff

* more cleanup and docs

* create LazyMultiIndex

* move manifest stuff into manifest class

* add manifests to SBTs

* CSV output function

* done, I think?

* fix tests

* update comments, constructor, etc.

* fix tests :)

* more picklist tests

* verify output

* add --picklist-require-all &c

* documentation

* test with --md5 selector

* cover untested code with tests

* trap errors and be nice to users

* remove comment

* fix tests for new SignaturePicklist

* move picklist.py from sourmash.sig into sourmash

* move picklist reporting into sourmash_args

* fix space

* add picklist args throughout, eek.

* add picklists and tests for search, gather, index

* add picklists to prefetch

* add picklists to sourmash compare

* add picklists to lca index

* block multiple picklists on SBTs and LCAs, for now

* add picklist test that checks indexing-and-then-search == index

* add a test for using prefetch CSV as picklist

* remove debugging print

* add docs

* fix coltypes

* remove order dependence from test

* only match picklist at end of 'select'

* further attempt to fix test

* remove @ctb comments

* cleanup of comments etc.

* fix test for manifests

* add manifest versions

* add a test for sig manifest

* add manifest tests

* add save/load test

* rename matches_siginfo to matches_manifest_row

* add docstring

* reverse order of adding to seen set

* fix header writing

* change LoadedCollection back over to MultiIndex; remove LazyMultiIndex

* revert collection to multiindex

* remove print

* move manifest stuff to manifest.py

* add manifests to default zip collection output

* rename signatures_with_internal to _signatures_with_internal

* update sig manifest to error when manifests cannot be generated

* add sig manifest tests for other file types

* add use_manifest fixture, refactor manifest loading

* more manifest testing for zipfiles

* check compatibility in MinHash.intersection_and_union

* refactor zipfile select

* more refactor zipfile select

* don't test manifest content

* update test files to have manifest, update tests

* remove print statements

* add test for multiple selects

* update docstring

* fix tests for a CLEAN test-data/prot/ directory

* [MRG] compress manifests in .zip/.sbt.zip files, and set default file permissions in zip (#1633)

* w/zip, compress manifests, and set good default file permissions

* [MRG] alias `--nucleotide`, `--no-nucleotide` for moltype args. (#1632)

* add --nucleotide to moltype args

* test --nucleotide; update bad moltypes error msg

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>

Co-authored-by: Tessa Pierce Ward <bluegenes@users.noreply.github.com>
  • Loading branch information
ctb and bluegenes authored Jun 24, 2021
1 parent 7e8df62 commit 9dbd8b5
Show file tree
Hide file tree
Showing 22 changed files with 923 additions and 175 deletions.
1 change: 1 addition & 0 deletions src/sourmash/cli/sig/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,7 @@
from . import filter
from . import flatten
from . import intersect
from . import manifest
from . import merge
from . import rename
from . import subtract
Expand Down
4 changes: 4 additions & 0 deletions src/sourmash/cli/sig/cat.py
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,10 @@ def subparser(subparsers):
'-u', '--unique', action='store_true',
help='keep only distinct signatures, removing duplicates (based on md5sum)'
)
subparser.add_argument(
'-f', '--force', action='store_true',
help='try to load all files as signatures'
)


def main(args):
Expand Down
27 changes: 27 additions & 0 deletions src/sourmash/cli/sig/manifest.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,27 @@
"""create a manifest for a collection of signatures"""

import sourmash
from sourmash.logging import notify, print_results, error


def subparser(subparsers):
subparser = subparsers.add_parser('manifest')
subparser.add_argument('location')
subparser.add_argument(
'-q', '--quiet', action='store_true',
help='suppress non-error output'
)
subparser.add_argument(
'-o', '--output', '--csv', metavar='FILE',
help='output information to a CSV file',
required=True,
)
subparser.add_argument(
'-f', '--force', action='store_true',
help='try to load all files as signatures'
)


def main(args):
import sourmash
return sourmash.sig.__main__.manifest(args)
Loading

0 comments on commit 9dbd8b5

Please sign in to comment.