-
Notifications
You must be signed in to change notification settings - Fork 34
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
eliminate seek()s #11
Conversation
Alas this fix doesn't appear to be compatible with Python 2.6. How annoying. @ctb are you willing to ditch Python2.6 support for screed to enable streaming? Otherwise one could build a workaround using the Python zlib library. Bleh. |
What's not python2.6 compat? Is it handled by future? And/or can we try the 2.7 way and fall back to 2.6 dynamically, thus enabling only on 2.7+? |
Not fixable via a We could use http://bugs.python.org/file15619/gzip_7471_py27.diff to do a temporary MonkeyPatch of Python2.6. I'll try out a versioned fall back first. |
Fallback is in place; could be a bit cleaner to allow for |
@brtaylor92 & @ctb ready for review & merge |
On Fri, Oct 31, 2014 at 10:07:03AM -0700, Michael R. Crusoe wrote:
ok, I'll try to take a look, but might not get to it 'til middle of --tC. Titus Brown, ctb@msu.edu |
|
||
if not line: | ||
return [] | ||
if sys.version_info[1] >= 7: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Maybe check that this is Python 2 here, or at least put a comment in about Python 2 just to make it clear for casual readers?
While testing in khmer there's a change in behaviour as empty files now throw an exception. Maybe we should have screed tests that test empty file behaviours. I'll write that when I'm done. Also, what's up with this test? https://github.com/ged-lab/khmer/blob/master/tests/test_scripts.py#L449 |
Thanks @bocajnotnef. Yeah, it is a bit weird that we expect diginorm to silently accept an empty file without a |
|
line = sequencefile.readline() | ||
|
||
if not line: | ||
return [] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The current screed returns an empty list in the instance of an empty file. @ctb shall I retain this behaviour or throw an exception? (Like we do for an unknown file format)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
On Tue, Nov 04, 2014 at 11:15:58AM -0800, Michael R. Crusoe wrote:
compression = None for magic, ftype in magic_dict.items(): if file_start.startswith(magic): compression = ftype break
- sequencefile = {
'gz': lambda: gzip.open(filename),
'bz2': lambda: bz2.BZ2File(filename),
'zip': lambda: zipfile.ZipFile(filename),
- None: lambda: _open(filename)}compression
- line = sequencefile.readline()
- if not line:
return []
The current screed returns an empty list in the instance of an empty file. @ctb shall I retain this behaviour or throw an exception? (Like we do for an unknown file format)
+0 for returning empty list. I don't have a strong reason for this but
there is nothing technically wrong with an empty file, is there? :)
cheers,
--titus
C. Titus Brown, ctb@msu.edu
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It isn't wrong to return an empty list; it is a bit uninformative though
Relevant to dib-lab/khmer#654 |
Tests have been added. |
@ctb @camillescott @luizirber Status of CR on this....? |
I've retained the behavior of opening an empty file (returning an empty list) |
Since we already have a buffered reader there is no reason to further wrap the compressed file readers in another buffered reader. This frees us to support streaming with Python 2.6 as well
@ctb this is ready for your review |
rawfile.close() | ||
bufferedfile = io.open(file=filename, mode='rb', buffering=8192) | ||
num_bytes_to_peek = max(len(x) for x in magic_dict) | ||
file_start = bufferedfile.peek(8192) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't 8192 be replaced by num_bytes_to_peek?
Looks great apart from the few specific comments! Thanks! |
Ugh: Python 2.6 & Python 2.7 allow one to stream from a zipfile via |
I"m ripping out zipfile support for now and saving it in another branch for future development. |
Harsh Jenkins: why are you judging me for not having fixed all the PEP8 errors when #17 hasn't been merged yet? |
Good form? And I don't recall setting that. Jenkins should be very On Fri, Dec 19, 2014, 13:26 Michael R. Crusoe notifications@github.com
|
Which reminds me, what's #17 still need? On Fri, Dec 19, 2014, 16:33 Jake Fenton bocajnotnef@gmail.com wrote:
|
I think I had prematurely set that. |
@ctb ready for review & merge |
Nice work! |
Fixes dib-lab/khmer#633