You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In the trim-low-abund.py odyssey (#732), I got it all done only to find out that when I ran it on some real data, the script broke with an uncaught exception when len(read) < K. In retrospect this was an obvious problem that is handled explicitly by lots of other scripts, but I just didn't think about it.
So, we should check for this kind of thing in our script tests. It helps that broken_paired_reader now lets you specify a minimum length of sequences to return.
Hmm, maybe the right way to move forward is to add a list of "common problems to test" into the script/ and sandbox/ requirements. I'm leary of making that too long but it could include "think about how Ns are handled" as well as how short sequences are handled and how sequence names are dealt with (ref Casava stuff #818). We could also go through and turn writing tests for this into
The text was updated successfully, but these errors were encountered:
As part of the resolution for #1434 we should have a test input file for the various scripts that contains short sequences, sequences with non-ACGT, lowercase, etc. and make sure that when we run a script/ file on it the code leaves the badness unchanged.
And then we should write a script that says "fix this file, okay?" and just outputs stats for things that were fixed -- e.g. "this many Ns replaced", etc.
In the trim-low-abund.py odyssey (#732), I got it all done only to find out that when I ran it on some real data, the script broke with an uncaught exception when len(read) < K. In retrospect this was an obvious problem that is handled explicitly by lots of other scripts, but I just didn't think about it.
So, we should check for this kind of thing in our script tests. It helps that broken_paired_reader now lets you specify a minimum length of sequences to return.
Hmm, maybe the right way to move forward is to add a list of "common problems to test" into the script/ and sandbox/ requirements. I'm leary of making that too long but it could include "think about how Ns are handled" as well as how short sequences are handled and how sequence names are dealt with (ref Casava stuff #818). We could also go through and turn writing tests for this into
The text was updated successfully, but these errors were encountered: