-
Notifications
You must be signed in to change notification settings - Fork 295
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
IO experiments #1554
Comments
Some example output from running on the ecoli file:
|
nice! Do we know yet why @camillescott's Cython version of read parsing was so much faster? Is it just the call-into-Python problem? |
One more note: there must be something wrong with how I used I gave the cython |
In light of the experiments in #1553 I'm thinking that worrying about how to read stuff from disk faster won't give us big wins compared to speeding up entering things into a countgraph and friends. I think we should do some thinking how to take advantage of the fact that a super simple bit of python can be ~2 times faster than |
On Fri, Dec 16, 2016 at 07:12:13AM -0800, Tim Head wrote:
In light of the experiments in #1553 I'm thinking that worrying about how to read stuff from disk faster won't give us big wins compared to speeding up entering things into a countgraph and friends.
I think we should do some thinking how to take advantage of the fact that a super simple bit of python can be ~2 times faster than `ReadParser` (even `read_plain2` is faster). For example, can we use a simple fast "parser" with no error handling until we encounter something that confuses the "parser" and only then switch to a robust/slow parser to try and recover from the error. Ideally all without making the code more complex.
+1 for brainstorming and trying stuff out! I've thought idly about stuff
like this in the past but never gone anywhere with it; it'd be great to try
it out.
|
This is a small study of different ways to handle the input. The idea is to gather some data on what works, what doesn't and what is slow/fast.
https://gist.github.com/betatim/d712d0b47a6136998c16561c8f1ca686
Interesting observations:
ReadParser
(but this is a dumb, none robust python version)open(fname 'rb')
beats everyone if you do not decode the bytes (treat everything as int)open(.., 'r', encoding='ascii')
is faster than decoding each line ourselvesbuffering=...
doesn't seem to do muchNot many conclusions yet.
The text was updated successfully, but these errors were encountered: