-
-
Notifications
You must be signed in to change notification settings - Fork 383
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implemented possibility to read directly from WebHDFS using smart_open #29
Conversation
@@ -36,6 +36,10 @@ It is well tested (using `moto <https://github.com/spulec/moto>`_), well documen | |||
>>> for line in smart_open.smart_open('hdfs://user/hadoop/my_file.txt'): | |||
... print line | |||
|
|||
>>> # stream from WebHDFS | |||
>>> for line in smart_open.smart_open('webhdfs://host:port/user/hadoop/my_file.txt'): | |||
... print line |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
missing 1 space
Thanks @ziky90 ! Can you add some unit tests so we can merge? |
Added space + replaced os.path.join. |
Cool, thanks! |
I have added read and seek methods implementation + unit tests. For now I have at least added requests to the .travis.yml file + I'll try to fix the Python 3 related problem with tests. |
OK, thanks! |
@@ -140,9 +145,16 @@ def __init__(self, uri, default_scheme="file"): | |||
|
|||
if self.scheme == "hdfs": | |||
self.uri_path = parsed_uri.netloc + parsed_uri.path | |||
if self.uri_path[0] != "/": |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Safer to use lstrip('/')
here? (this will fail if there's no 0th element)
Or what is this for? When does this happen? Can uri_path
start with multiple leading //
?
Tests for Python3 fixed, though I don't understand error log for Python2.6. It seems to as travis just got stuck (same as for commit 15c7cfc)? |
Yeah I remember seeing that too. Not sure why it gets stuck -- seems something to do with multiprocessing. In any case, not related to your PR here. I re-ran the failing travis test and it passes. Are we good to merge or do you plan to add more commits? |
Thanks, from my point of view it's ready to merge. |
Implemented possibility to read directly from WebHDFS using smart_open
Thanks, merged! How would you describe this PR in one line (for the CHANGELOG)? |
probably the best description would be something like: |
This PR partially solves the issue #23. It implements reading from WebHDFS based on API described at https://hadoop.apache.org/docs/r1.0.4/webhdfs.html.
For reading from WebHDFS you can use:
webhdfs://host:port/path/file
NOTE: