-
-
Notifications
You must be signed in to change notification settings - Fork 382
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Added support for viewfs:// URLs. #665
Conversation
As stated by @zhezh that hdfs support viewfs we just need to convert viewfs to hdfs uri internally and it will work.
@mpenkov could you take a look? |
smart_open/hdfs.py
Outdated
|
||
uri_path = split_uri.netloc + split_uri.path | ||
uri_path = "/" + uri_path.lstrip("/") | ||
if not uri_path: | ||
raise RuntimeError("invalid HDFS URI: %r" % uri_as_string) | ||
|
||
return dict(scheme=SCHEME, uri_path=uri_path) | ||
return dict(scheme="hdfs", uri_path=uri_path) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is a bit hacky. It's OK to reuse the hdfs backend for viewfs, but here you're actually modifying the URL that the user passed. On top of violating the principle of least surprise, it's only marginally better than:
url = 'viewfs://foo/bar/baz'
with smart_open.open(url.replace('viewfs://', 'hdfs://'):
...
It'd be better to do
return dict(scheme="hdfs", uri_path=uri_path) | |
return dict(scheme=split_uri.scheme, uri_path=uri_path) |
and then handle viewfs and hdfs as identical in remaining code (but keep them distinct). This would be similar to how we handle e.g. http and https.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for your contribution. It's a good start, but needs more work.
Please add some tests to verify that your contribution works and prevent future regressions. See smart_open/tests/test_hdfs.py and integration-tests/test_hdfs.py for the existing HDFS tests.
Left you a comment - please have a look and let me know when you're ready for another review.
@ChandanChainani Are you able to finish this PR? |
@mpenkov I didn't got the chance to complete the test changes. Please can you guide me how can i write test cases for I am thinking since we are using Sorry if I put the words in wrong way I am not good in english. |
@mpenkov any suggestion on above comment. |
Getting this error: [WinError 6] The handle is invalid
I added some tests. Thank you for your work! |
As stated by @zhezh that hdfs support viewfs
we just need to convert viewfs to hdfs uri internally
and it will work.
Fixes #645