-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix issues #152, #153 and #154 #155
Conversation
Looks like moto doesn't work with boto anymore. Works fine with boto3.
The existing BufferedInputBase didn't override the .readline() method, forcing the superclass implementation to use .read() to read one byte at a time. This slowed reading down significantly. Also increased the buffer size to 256kB, this is consistent with s3transfer. http://boto3.readthedocs.io/en/latest/_modules/boto3/s3/transfer.html
smart_open/s3.py
Outdated
@@ -28,6 +28,10 @@ | |||
MODES = (READ, READ_BINARY, WRITE, WRITE_BINARY) | |||
"""Allowed I/O modes for working with S3.""" | |||
|
|||
BINARY_NEWLINE = b'\n' | |||
TEXT_NEWLINE = b'\n' |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Both are binary?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right. We're not using TEXT_NEWLINE right now, so I removed it.
@@ -9,7 +9,7 @@ | |||
else: | |||
import unittest | |||
|
|||
import boto | |||
import boto3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This seems to be a big change; does it belong here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@piskvorky It's not really that big a change. This is test code, and it writes a mock object to a mock S3 bucket. The code for doing this with boto and boto3 is slightly different, but the end result is the same (the tests still pass without changing the code). If you look at the remainder of the changes in test_s3.py, you'll see the tests aren't strongly coupled to either boto or boto3.
The real benefit to using boto3 in tests is that it matches the implementation: our S3 implementation uses boto3 under the covers. Also, using boto3 is future-proof: newer versions of moto mock boto and boto3 separately. That is, objects mocked via boto are not visible to boto3 and vice versa. This means that if we upgrade to the most recent moto version (unlike the year-old version currently used in the test environment), our boto-based tests will break.
This change solves that problem before it happens.
When writing, check if the bucket exists and raise a ValueError if it doesn't.
This is no longer necessary since we dropped support for Python 2.6. The GzipFile from Py2.7 and above is already a context manager, so the closing is not required. https://docs.python.org/2/library/gzip.html
If we don't specify it, smart_open will use the system encoding. This may be ascii on some systems (e.g. Py2.7), which will fail the test because it contains non-ascii characters.
Renamed RawReader to SeekableRawReader. Renamed BufferedInputBase to SeekableBufferedInputBase. Introduced new, non-seekable RawReader and BufferedInputBase. Seeking functionality was strictly necessary while we were supporting Py2.6, because the gzip reader required it back then. The gzip reader from 2.7 onwards does not require seeking, so neither do we. Seeking is convenient, but appears to be slower, so disabling it for now is the right thing to do.
It appears Py2.7 gzip still requires seeking. Py3 gzip does not. We're still supporting Py2.7, so we need seeking to work if we continue to use the existing gzip reader.
Great fixes, thanks a lot @mpenkov 🥇 |
We didn't have a readline in our S3 reader, and that was slowing us down.
Here is a breakdown of the contents: