some bulk sequence loading tests that nail down current ACGTN behavior. #1633

ctb · 2017-02-19T20:02:12Z

These tests nail down behavior prior to #1590, which, when merged, will alter how we handle non-ACTG characters. Note, no behavior is changed in this PR; it's just (lots of) new tests.

This explicitly puts in place tests for sequences that contain one of lowercase, Ns, and non-ACGTN characters, for:

consume_fasta and all other bulk-sequence loading functions on Hashtables and derived classes;
trim_on_abundance, trim_below_abundance and find_spectral_error_positions

This PR includes #1661.

Adds new test file tests/test_sequence_validation.py and data file tests/test-data/valid-read-testing.fq.

Is it mergeable?
make test Did it pass the tests?
make clean diff-cover If it introduces new functionality in
scripts/ is it tested?
make format diff_pylint_report cppcheck doc pydocstyle Is it well
formatted?
Did it change the command-line interface? Only backwards-compatible
additions are allowed without a major version increment. Changing file
formats also requires a major version number increment.
For substantial changes or changes to the command-line interface, is it
documented in CHANGELOG.md? See keepachangelog
for more details.
Was a spellchecker run on the source code and documentation after
changes were made?
Do the changes respect streaming IO? (Are they
tested for streaming IO?)

codecov-io · 2017-02-19T20:18:54Z

Codecov Report

Merging #1633 into master will increase coverage by 0.1%.
The diff coverage is 25%.

@@            Coverage Diff            @@
##           master    #1633     +/-   ##
=========================================
+ Coverage   69.82%   69.93%   +0.1%     
=========================================
  Files          66       66             
  Lines        8974     8976      +2     
  Branches     3060     3062      +2     
=========================================
+ Hits         6266     6277     +11     
+ Misses       1025     1018      -7     
+ Partials     1683     1681      -2

Impacted Files	Coverage Δ
lib/read_parsers.hh	`71.42% <25%> (+5.8%)`	⬆️
khmer/_khmer.cc	`57.48% <0%> (+0.05%)`	⬆️
khmer/_cpy_hashgraph.hh	`54.15% <0%> (+0.07%)`	⬆️
lib/hashgraph.cc	`46.96% <0%> (+0.34%)`	⬆️
lib/hashtable.cc	`57.54% <0%> (+2.14%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 8f00640...3cb741b. Read the comment docs.

ctb · 2017-02-19T20:22:25Z

Wow, these tests are like a showcase of horrible inconsistency in loading sequences.

betatim · 2017-02-20T14:10:31Z

tests/test_read_parsers.py

+    kmer = "caggcgcccaccacc".upper()
+    assert x.get(kmer) == 1
+
+    # the 2nd read with this k-mer in it has an N in it; 'consume' will ignore.


I'd rephrase the comment. At the moment I am very puzzled after reading it (ah its being ignored so count should be one, but it is asserted to be two ...??)

Maybe "consume will ignore the invalid base and continue consuming the read, so this kmer after the N should have abundance 2"

fixed in fe48787

betatim · 2017-02-20T14:29:28Z

I'd move these tests to a new file maybe test_sequence_loading.py? Then we have one file that deals exclusively with verifying the assumptions we make about who cleans what, when and how.

Because it is easy to do I would parametrise the tests (that make sense) on the class so we test all combinations of Count/Node and table/graph. In which case we should get the parametrisation stuff from test_tabletype.py (or put these tests there, I'm -0)

betatim · 2017-02-20T14:30:59Z

tests/test_read_parsers.py

+
+    x.output_partitions(infile, savepath)
+
+    read_names = [ read.name for read in ReadParser(savepath) ]


The pep8 🚓 doesn't like the extra white space (hence the travis failure)

fixed 756454e

betatim · 2017-02-20T14:35:34Z

+1 on merging this before 2.1, then #1590 can update the behaviour and fix these tests (remove the "in the future ..." comments)

…d_dna_tests

ctb · 2017-02-25T05:12:11Z

Tests moved to test_sequence_validation.py and parameterized as per @betatim suggestion! Of course now a bunch of tests are failing... :)

betatim · 2017-02-28T09:30:27Z

Linux build fails because of a pep8 violation. The OSX build fails because some tests fail and then we exit ungracefully because there are too many open files. Will take a look at the latter.

betatim · 2017-02-28T10:15:30Z

Ha, locally on OSX all tests pass as well and no "too many open files" warnings.

With the fixture we can explicitly close all the ReadParser which might help with the too many open files error on OSX Travis

betatim · 2017-02-28T11:03:45Z

Switched to using a fixture for ReadParser which allows us to explicitly close it after using it. Not sure this fixes the problem, but it is my best guess as I can't reproduce it locally.

betatim · 2017-02-28T13:03:57Z

🎉

…_tests

…ests

ctb · 2017-03-19T14:48:58Z

This is now ready for review & merge (although #1661 should be merged first :)

@luizirber @betatim @standage

ctb · 2017-03-19T14:53:06Z

(Upon merge, we should switch #1590 over to be against master.)

betatim · 2017-03-20T08:13:44Z

tests/test_sequence_validation.py

+    savepath = utils.get_temp_filename('foo')
+
+    # read this in using "approved good" behavior w/cleaned_seq
+    x = _Nodegraph(8, PRIMES_1m)


-> graphtype?

betatim

LGTM, except for that one comment. Nitpick: if you feel like switching graphtype to Graphtype so that types start with an upper case letter I would ❤️ that.

betatim · 2017-03-20T08:32:09Z

tests/test_sequence_validation.py

+    return request.param
+
+
+@pytest.fixture


Now that I've looked this up: can we change this back to yield and change the decorator to @pytest.yield_fixture? Link to the pytest 2.9 docs: http://doc.pytest.org/en/2.9.2/fixture.html#fixture-finalization-executing-teardown-code

ctb · 2017-03-20T13:02:58Z

done

ctb · 2017-03-20T13:05:04Z

all comments addressed in e419a7b

betatim · 2017-03-20T16:57:10Z

🎉

ctb added 2 commits February 19, 2017 12:01

some bulk loading/cleaning tests

9eed9ae

nail down existing ACTGN behavior

c0c5d2c

ctb mentioned this pull request Feb 19, 2017

[MRG] remove redundant ACGT-checking code. #1590

Merged

8 tasks

ctb added 4 commits February 19, 2017 13:56

add tests for various trim functions w/bad DNA input

e77f1e1

fixed

33a7702

tset current output_partitions behavior in subset.cc

adfe3b9

add test for trim_on_stoptags

5adf7bf

betatim reviewed Feb 20, 2017

View reviewed changes

betatim added this to the 2.1 milestone Feb 20, 2017

ctb added 6 commits February 24, 2017 20:42

Merge branch 'master' of github.com:dib-lab/khmer into remove/is_vali…

dac7996

…d_dna_tests

fix comments to be clearer

fe48787

fix pep8

756454e

moved read validity foo over to test_sequence_validation.py

2db8180

split tests out to test_sequence_validation

870678e

parameterized with tabletype

b2b22d7

add exceptions

12b88a5

Add a fixture for ReadParser use

d303d46

With the fixture we can explicitly close all the ReadParser which might help with the too many open files error on OSX Travis

ctb added 2 commits March 18, 2017 19:27

Merge remote-tracking branch 'origin/master' into remove/is_valid_dna…

28a1c7b

…_tests

fix setup.py for offline operation

08d2f28

ctb added 9 commits March 18, 2017 19:51

check the number of tags created by consume_fasta_and_tag

4c9c63b

add partition IDs to valid-dna testing fq

5e01a81

add test for output_partitions

85b8fa2

increase ksize to 15 for partition reading tests

f61e37c

add tests for labelhash

b1519a3

uncomment pytest-runner req

08eca98

remove unnecessary imports

caa85b4

remove line spacing

bf94884

distinguished tabletype from countingtype

15bc92c

ctb mentioned this pull request Mar 19, 2017

py.test fixture used in #1633 causes problems #1660

Closed

ctb added 7 commits March 19, 2017 07:18

remove commented-out code

6304e2f

add a test where bad DNA is at beginning of string

d2a20d8

write test for consume_partitioned_fasta on unpartitioned file

86e7081

Merge branch 'fix/consume_partitioned_err' into remove/is_valid_dna_t…

4010999

…ests

add space in reporting

f47a360

Merge branch 'fix/consume_partitioned_err' into remove/is_valid_dna_t…

a294688

…ests

add partition ID to new 'bad' sequence

7e9c7a0

betatim reviewed Mar 20, 2017

View reviewed changes

betatim approved these changes Mar 20, 2017

View reviewed changes

betatim mentioned this pull request Mar 20, 2017

write test for consume_partitioned_fasta on unpartitioned file #1661

Merged

8 tasks

betatim reviewed Mar 20, 2017

View reviewed changes

address review comments <- minor changes

e419a7b

re-added reads.close to yield_fixture

3cb741b

ctb merged commit 6d6ef9f into master Mar 20, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

some bulk sequence loading tests that nail down current ACGTN behavior. #1633

some bulk sequence loading tests that nail down current ACGTN behavior. #1633

ctb commented Feb 19, 2017 •

edited

Loading

codecov-io commented Feb 19, 2017 •

edited

Loading

ctb commented Feb 19, 2017

betatim Feb 20, 2017

betatim Feb 20, 2017

ctb Feb 25, 2017

betatim commented Feb 20, 2017

betatim Feb 20, 2017

ctb Feb 25, 2017

betatim commented Feb 20, 2017

ctb commented Feb 25, 2017

betatim commented Feb 28, 2017

betatim commented Feb 28, 2017

betatim commented Feb 28, 2017

betatim commented Feb 28, 2017

ctb commented Mar 19, 2017

ctb commented Mar 19, 2017

betatim Mar 20, 2017

betatim left a comment

betatim Mar 20, 2017

ctb commented Mar 20, 2017 via email

ctb commented Mar 20, 2017 via email

betatim commented Mar 20, 2017


		x.output_partitions(infile, savepath)

		read_names = [ read.name for read in ReadParser(savepath) ]

some bulk sequence loading tests that nail down current ACGTN behavior. #1633

some bulk sequence loading tests that nail down current ACGTN behavior. #1633

Conversation

ctb commented Feb 19, 2017 • edited Loading

codecov-io commented Feb 19, 2017 • edited Loading

Codecov Report

ctb commented Feb 19, 2017

betatim Feb 20, 2017

Choose a reason for hiding this comment

betatim Feb 20, 2017

Choose a reason for hiding this comment

ctb Feb 25, 2017

Choose a reason for hiding this comment

betatim commented Feb 20, 2017

betatim Feb 20, 2017

Choose a reason for hiding this comment

ctb Feb 25, 2017

Choose a reason for hiding this comment

betatim commented Feb 20, 2017

ctb commented Feb 25, 2017

betatim commented Feb 28, 2017

betatim commented Feb 28, 2017

betatim commented Feb 28, 2017

betatim commented Feb 28, 2017

ctb commented Mar 19, 2017

ctb commented Mar 19, 2017

betatim Mar 20, 2017

Choose a reason for hiding this comment

betatim left a comment

Choose a reason for hiding this comment

betatim Mar 20, 2017

Choose a reason for hiding this comment

ctb commented Mar 20, 2017 via email

ctb commented Mar 20, 2017 via email

betatim commented Mar 20, 2017

ctb commented Feb 19, 2017 •

edited

Loading

codecov-io commented Feb 19, 2017 •

edited

Loading