-
-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HdfsOpenWrite implementation similar to read #106
Conversation
Looks promising. |
Done |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you add an example to README?
Similar to existing example for writing to WebHDFS?
@mock.patch('smart_open.smart_open_lib.subprocess') | ||
def test_hdfs(self, mock_subprocess): | ||
"""Is HDFS write called correctly""" | ||
smart_open_object = smart_open.HdfsOpenWrite(smart_open.ParseUri("hdfs:///tmp/test.txt")) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Woud it be possible to check on smart_open.smart_open("hdfs:///some/file.txt", "wb")
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took example from the HdfsOpenRead, and asked myself why it was called directly and not via smart_open.smart_open. And answered to myself that in the unit test it is preferable to check the most direct API - the smart_open.smart_open, the call to ParseUri and branching according to the schema is tested elsewhere.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, agree that HdfsOpenRead doesn't have this test. There is benefit in adding a test like that though to replace the exception test. As an aside, an ideal would be to mock hdfs similarly to s3 tests
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I did not understand - what is the outcome. Do you want to leave it as is or to replace the call to HdfsOpenWrite ctor to the smart_open call?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would like to add tests for smart_open.smart_open("hdfs:///some/file.txt", "wb")
and smart_open.smart_open("hdfs:///some/file.txt", "rb")
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It is a good idea to add these tests. But IMO it's better to leave this pull request focus on the write operation alone, as it was from the beginning and add new tests for smart_open.smart_open on a separate pull request.
smart_open/smart_open_lib.py
Outdated
@@ -490,6 +492,33 @@ def __exit__(self, type, value, traceback): | |||
pass | |||
|
|||
|
|||
class HdfsOpenWrite(object): | |||
""" | |||
Implement streamed reader from HDFS, as an iterable & context manager. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You mean Implement streamed writer to HDFS, as a context manager.
?
Ok, Thanks for the PR! |
I used the same idea which works for HdfsOpenRead to implement the write operation.