Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HdfsOpenWrite implementation similar to read #106

Merged
merged 3 commits into from
Feb 7, 2017
Merged

Conversation

skibaa
Copy link
Contributor

@skibaa skibaa commented Feb 6, 2017

I used the same idea which works for HdfsOpenRead to implement the write operation.

@tmylk
Copy link
Contributor

tmylk commented Feb 6, 2017

Looks promising.
May I ask you to update existing tests and add new ones like for [read with mocking]
(https://github.com/RaRe-Technologies/smart_open/blob/master/smart_open/tests/test_smart_open.py#L141)

@skibaa
Copy link
Contributor Author

skibaa commented Feb 7, 2017

Done

Copy link
Contributor

@tmylk tmylk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you add an example to README?
Similar to existing example for writing to WebHDFS?

@mock.patch('smart_open.smart_open_lib.subprocess')
def test_hdfs(self, mock_subprocess):
"""Is HDFS write called correctly"""
smart_open_object = smart_open.HdfsOpenWrite(smart_open.ParseUri("hdfs:///tmp/test.txt"))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Woud it be possible to check on smart_open.smart_open("hdfs:///some/file.txt", "wb")?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I took example from the HdfsOpenRead, and asked myself why it was called directly and not via smart_open.smart_open. And answered to myself that in the unit test it is preferable to check the most direct API - the smart_open.smart_open, the call to ParseUri and branching according to the schema is tested elsewhere.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, agree that HdfsOpenRead doesn't have this test. There is benefit in adding a test like that though to replace the exception test. As an aside, an ideal would be to mock hdfs similarly to s3 tests

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did not understand - what is the outcome. Do you want to leave it as is or to replace the call to HdfsOpenWrite ctor to the smart_open call?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would like to add tests for smart_open.smart_open("hdfs:///some/file.txt", "wb") and smart_open.smart_open("hdfs:///some/file.txt", "rb")

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is a good idea to add these tests. But IMO it's better to leave this pull request focus on the write operation alone, as it was from the beginning and add new tests for smart_open.smart_open on a separate pull request.

@@ -490,6 +492,33 @@ def __exit__(self, type, value, traceback):
pass


class HdfsOpenWrite(object):
"""
Implement streamed reader from HDFS, as an iterable & context manager.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You mean Implement streamed writer to HDFS, as a context manager.?

@tmylk tmylk merged commit 7f52776 into piskvorky:master Feb 7, 2017
@tmylk
Copy link
Contributor

tmylk commented Feb 7, 2017

Ok, Thanks for the PR!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants