Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

eliminate seek()s #11

Merged
merged 16 commits into from
Dec 19, 2014
Merged

eliminate seek()s #11

merged 16 commits into from
Dec 19, 2014

Conversation

mr-c
Copy link
Contributor

@mr-c mr-c commented Oct 10, 2014

@mr-c
Copy link
Contributor Author

mr-c commented Oct 10, 2014

Alas this fix doesn't appear to be compatible with Python 2.6. How annoying. @ctb are you willing to ditch Python2.6 support for screed to enable streaming?

Otherwise one could build a workaround using the Python zlib library. Bleh.

@ctb
Copy link
Member

ctb commented Oct 10, 2014

What's not python2.6 compat? Is it handled by future? And/or can we try the 2.7 way and fall back to 2.6 dynamically, thus enabling only on 2.7+?

@mr-c
Copy link
Contributor Author

mr-c commented Oct 10, 2014

The gzip module’s GzipFile now [...] implements the io.BufferedIOBase ABC, so you can wrap it with io.BufferedReader for faster processing (contributed by Nir Aides; issue 7471).

Not fixable via a __future__ import.

We could use http://bugs.python.org/file15619/gzip_7471_py27.diff to do a temporary MonkeyPatch of Python2.6.

I'll try out a versioned fall back first.

@mr-c
Copy link
Contributor Author

mr-c commented Oct 10, 2014

Fallback is in place; could be a bit cleaner to allow for seek-free parsing for non-gziped files under Python2.6.

@mr-c
Copy link
Contributor Author

mr-c commented Oct 31, 2014

@brtaylor92 & @ctb ready for review & merge

@ctb
Copy link
Member

ctb commented Nov 1, 2014

On Fri, Oct 31, 2014 at 10:07:03AM -0700, Michael R. Crusoe wrote:

@brtaylor92 & @ctb ready for review & merge

ok, I'll try to take a look, but might not get to it 'til middle of
week.

--t

C. Titus Brown, ctb@msu.edu


if not line:
return []
if sys.version_info[1] >= 7:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe check that this is Python 2 here, or at least put a comment in about Python 2 just to make it clear for casual readers?

@SensibleSalmon
Copy link
Contributor

While testing in khmer there's a change in behaviour as empty files now throw an exception.

Maybe we should have screed tests that test empty file behaviours.

I'll write that when I'm done.

Also, what's up with this test? https://github.com/ged-lab/khmer/blob/master/tests/test_scripts.py#L449

@mr-c
Copy link
Contributor Author

mr-c commented Nov 3, 2014

Thanks @bocajnotnef. Yeah, it is a bit weird that we expect diginorm to silently accept an empty file without a --force or similar. Seems to be contrary to how the other tools work.

@mr-c
Copy link
Contributor Author

mr-c commented Nov 3, 2014

  • write a test for unknown file format
  • port @bocajnotnef 's streaming tests to work directly with screed objects

line = sequencefile.readline()

if not line:
return []
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current screed returns an empty list in the instance of an empty file. @ctb shall I retain this behaviour or throw an exception? (Like we do for an unknown file format)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Tue, Nov 04, 2014 at 11:15:58AM -0800, Michael R. Crusoe wrote:

 compression = None
 for magic, ftype in magic_dict.items():
     if file_start.startswith(magic):
         compression = ftype
         break
  • sequencefile = {
  •    'gz': lambda: gzip.open(filename),
    
  •    'bz2': lambda: bz2.BZ2File(filename),
    
  •    'zip': lambda: zipfile.ZipFile(filename),
    

- None: lambda: _open(filename)}compression

- line = sequencefile.readline()

  • if not line:
  •    return []
    

The current screed returns an empty list in the instance of an empty file. @ctb shall I retain this behaviour or throw an exception? (Like we do for an unknown file format)

+0 for returning empty list. I don't have a strong reason for this but
there is nothing technically wrong with an empty file, is there? :)

cheers,

--titus

C. Titus Brown, ctb@msu.edu

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It isn't wrong to return an empty list; it is a bit uninformative though

@SensibleSalmon
Copy link
Contributor

Relevant to dib-lab/khmer#654

@mr-c
Copy link
Contributor Author

mr-c commented Dec 2, 2014

Tests have been added.

@SensibleSalmon
Copy link
Contributor

@ctb @camillescott @luizirber Status of CR on this....?

@mr-c
Copy link
Contributor Author

mr-c commented Dec 5, 2014

I've retained the behavior of opening an empty file (returning an empty list)

mr-c added 8 commits December 17, 2014 10:37
Since we already have a buffered reader there is no reason to further wrap the
compressed file readers in another buffered reader. This frees us to support
streaming with Python 2.6 as well
@mr-c
Copy link
Contributor Author

mr-c commented Dec 17, 2014

@ctb this is ready for your review

rawfile.close()
bufferedfile = io.open(file=filename, mode='rb', buffering=8192)
num_bytes_to_peek = max(len(x) for x in magic_dict)
file_start = bufferedfile.peek(8192)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

shouldn't 8192 be replaced by num_bytes_to_peek?

@ctb
Copy link
Member

ctb commented Dec 18, 2014

Looks great apart from the few specific comments! Thanks!

@mr-c
Copy link
Contributor Author

mr-c commented Dec 18, 2014

Ugh: Python 2.6 & Python 2.7 allow one to stream from a zipfile via /dev/stdin but not from a named pipe/FIFO. Excuse me while I find a new way to test this.

@mr-c
Copy link
Contributor Author

mr-c commented Dec 18, 2014

I"m ripping out zipfile support for now and saving it in another branch for future development.

@mr-c
Copy link
Contributor Author

mr-c commented Dec 19, 2014

Harsh Jenkins: why are you judging me for not having fixed all the PEP8 errors when #17 hasn't been merged yet?

@SensibleSalmon
Copy link
Contributor

Good form? And I don't recall setting that. Jenkins should be very
tolerant. Currently.

On Fri, Dec 19, 2014, 13:26 Michael R. Crusoe notifications@github.com
wrote:

Harsh Jenkins: why are you judging me for not having fixed all the PEP8
errors when #17 #17 hasn't been
merged yet?


Reply to this email directly or view it on GitHub
#11 (comment).

@SensibleSalmon
Copy link
Contributor

Which reminds me, what's #17 still need?

On Fri, Dec 19, 2014, 16:33 Jake Fenton bocajnotnef@gmail.com wrote:

Good form? And I don't recall setting that. Jenkins should be very
tolerant. Currently.

On Fri, Dec 19, 2014, 13:26 Michael R. Crusoe notifications@github.com
wrote:

Harsh Jenkins: why are you judging me for not having fixed all the PEP8
errors when #17 #17 hasn't been
merged yet?


Reply to this email directly or view it on GitHub
#11 (comment).

@mr-c
Copy link
Contributor Author

mr-c commented Dec 19, 2014

I think I had prematurely set that.

@mr-c
Copy link
Contributor Author

mr-c commented Dec 19, 2014

@ctb ready for review & merge

@ctb
Copy link
Member

ctb commented Dec 19, 2014

Nice work!

ctb added a commit that referenced this pull request Dec 19, 2014
@ctb ctb merged commit 8238d98 into master Dec 19, 2014
@ctb ctb deleted the fix/buffered branch December 19, 2014 22:40
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

normalize-by-median.py does not accept reads on the stdin
3 participants