-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
khmer should be graceful with respect to errors while processing multiple files #87
Comments
On Fri, Jul 26, 2013 at 11:33:42AM -0700, cswelcher wrote:
I'm skeptical that this is a good idea; this sort of situation should only But I agree that diginorm shouldn't die. --tC. Titus Brown, ctb@msu.edu |
Agreed, screed is a parsing library, it has no idea what to do when it encounters malformed data so the only thing it can do is throw an exception. While it is annoying for a long process to fail because of a bad sequence, I still think that should be the default behavior. A bad sequence could indicate a variety of major errors, still providing an output file in those situations is sort of a dubious thing to do by default. |
I see your point. I think I'll write some sort of robust class that can sit on top of a screed parser and catalogue any bad reads. That way the user can choose whether the errors indicate a truly corrupted file, or just a random couple of mangled reads that can probably be dismissed. |
On Sat, Jul 27, 2013 at 09:31:43AM -0700, cswelcher wrote:
I'm still wondering why this is worthwhile :). The only source of FASTQ --tC. Titus Brown, ctb@msu.edu |
On Sun, Jul 28, 2013 at 3:36 PM, C. Titus Brown notifications@github.comwrote:
Here is a user story that may be useful: CDubb, a bioinformatician, is processing 70 files of FASTQ data through CDubb doesn't really care about that one read as he is under a research (all names have been changed to protect the innocent)
|
On Sun, Jul 28, 2013 at 01:55:59PM -0700, mr-c wrote:
In an earlier comment, I agreed that we should be graceful wrt multiple --t |
Agreed |
Is this handled by @cswelcher's recent push? |
Yup. |
It's now handled for normalize-by-median -- but what about the other scripts? Not sure what load-graph and load-into-counting should do. filter-abund should be tolerant. abundance-dist... should probably fail. Systematic examination needed. |
I've decided this is, in general, a horrible idea. See #1057 (comment) for full rationale. |
If we are processing multiple files and an error occurs then we should show that to the user and move on to the next file.
There should be a flag to disable this behavior, in which case we should tell the user there was an error in a specific file and quit gracefully.
"It's now handled for normalize-by-median -- but what about the other scripts? Not sure what load-graph and load-into-counting should do. filter-abund should be tolerant. abundance-dist... should probably fail. Systematic examination needed."
The text was updated successfully, but these errors were encountered: