-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Investigate/implement streaming approaches, more generally #393
Comments
Yes.
|
On Sep 2, 2014, at 7:19 AM, Michael R. Crusoe notifications@github.com wrote:
The question of how to handle it at the command level is one component of the issue, but there are several others — first, some of our algorithms don’t handle streaming properly yet. (I’m looking at you, filter-abund.) Here I think judicious refactoring of internal code to support iterator-style consumption and production of reads will be needed. second, some of our approaches are not single-pass, and will require “holding cells” for some data (filter-abund, again). Thinking about how to handle this cleanly has been a bit challenging, and will require some playing around. and then even if we figure all of this out, it’s not clear to me that supporting streaming by stdin/stdout is going to be terribly efficient. I’d like to support multi-threaded and multi-file reading (which is impossible via stdin). It should also be possible to support ad hoc composition of functions such that we could be flexible in distributed situations (e.g. machine A does diginorm, machine B does filter-abund, machine C does assembly). All of this will require some design work and the C++ read handling code could also use some refactoring... It’d be good to have someone prototype this out, but it seems beyond the scope of any current in lab effort, at least for now. But it’s a neat CS-y research project. —titus |
Re second issue (holding cells) see #601 for proposed approach. |
+1 to holding cells. Will need to write up how to redirect them to use a solid state drive, ram drive, et cetera for users. |
To address https://github.com/ged-lab/khmer/pull/644/files#r19680360 |
This should be relevant: https://github.com/ctb/2015-experimental-graphalign/blob/master/khmer_api.py#L136 What fun! |
Streaming fixes & tests in #1186; this will finish off most of the straightforward practical issues. |
(Updated in #1206)
In partial response to ivory.idyll.org/blog/2014-pycon.html, can we put our protocols on a streaming basis by systematically introducing streaming functionality into khmer?
There are ways to do this even for e.g. Trimmomatic using clever Unix socket tricks.
Also see #149.
The text was updated successfully, but these errors were encountered: