-
Notifications
You must be signed in to change notification settings - Fork 44
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[FSTORE-1183] Error downloading credentials for Kafka storage connector in Python mode #1213
Conversation
python/hsfs/util.py
Outdated
@@ -133,6 +133,13 @@ def get_host_name(): | |||
return host | |||
|
|||
|
|||
def get_dataset_type(path: str): | |||
if re.match(r"^(?:dfs://|)/apps/hive/warehouse/*", path): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I really don't understand this regex. A hopsworks path can either start with /
or hdfs://
(or hopsfs://
for that matter), but for sure not with dfs://
- So I'm really not sure why we are checking if it begins with dfs://
If I try the regex in a Python interpreter it doesn't match the following string: hdfs:///apps/hive/warehouse/test
which is a problem considering that here:
file = "hdfs://" + file |
we add
hdfs://
at the beginning of the file if the file doesn't contain any scheme.
I also checked a couple of storage connector, the rest API doesn't return any scheme, meaning hdfs://
is added by the line above.
This also means, that the default behaviour of uploading the credentials in the storage connector directory within the Hive warehouse directory is now broken with this PR because the regex doesn't match hdfs://
.
When you test this PR, please test both cases.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yeah sorry i am not sure how i lost the h
from hdfs
and probably didnt catch it because of this if statement: https://github.com/logicalclocks/feature-store-api/blob/master/python/hsfs/engine/python.py#L908 (aka. don't download file if it already exists)
This PR adds/fixes/changes...
JIRA Issue: -
Priority for Review: -
Related PRs: -
How Has This Been Tested?
Checklist For The Assigned Reviewer: