Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[FSTORE-1183] Error downloading credentials for Kafka storage connector in Python mode #1213

Merged
merged 14 commits into from
Feb 26, 2024

Conversation

bubriks
Copy link
Contributor

@bubriks bubriks commented Feb 6, 2024

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@bubriks bubriks added the WIP This issue or pull request is a work in progress label Feb 6, 2024
@bubriks bubriks removed the WIP This issue or pull request is a work in progress label Feb 7, 2024
@bubriks bubriks requested a review from SirOibaf February 7, 2024 15:47
@@ -133,6 +133,13 @@ def get_host_name():
return host


def get_dataset_type(path: str):
if re.match(r"^(?:dfs://|)/apps/hive/warehouse/*", path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand this regex. A hopsworks path can either start with / or hdfs:// (or hopsfs:// for that matter), but for sure not with dfs:// - So I'm really not sure why we are checking if it begins with dfs://

If I try the regex in a Python interpreter it doesn't match the following string: hdfs:///apps/hive/warehouse/test which is a problem considering that here:

file = "hdfs://" + file

we add hdfs:// at the beginning of the file if the file doesn't contain any scheme.

I also checked a couple of storage connector, the rest API doesn't return any scheme, meaning hdfs:// is added by the line above.

This also means, that the default behaviour of uploading the credentials in the storage connector directory within the Hive warehouse directory is now broken with this PR because the regex doesn't match hdfs://.

When you test this PR, please test both cases.

Copy link
Contributor Author

@bubriks bubriks Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry i am not sure how i lost the h from hdfs and probably didnt catch it because of this if statement: https://github.com/logicalclocks/feature-store-api/blob/master/python/hsfs/engine/python.py#L908 (aka. don't download file if it already exists)

@bubriks bubriks requested a review from SirOibaf February 26, 2024 09:07
@SirOibaf SirOibaf merged commit 48bb5e0 into logicalclocks:master Feb 26, 2024
11 checks passed
SirOibaf pushed a commit to SirOibaf/feature-store-api that referenced this pull request Feb 26, 2024
SirOibaf pushed a commit that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants