Skip to content

Conversation

@bubriks
Copy link
Contributor

@bubriks bubriks commented Feb 6, 2024

This PR adds/fixes/changes...

  • please summarize your changes to the code
  • and make sure to include all changes to user-facing APIs

JIRA Issue: -

Priority for Review: -

Related PRs: -

How Has This Been Tested?

  • Unit Tests
  • Integration Tests
  • Manual Tests on VM

Checklist For The Assigned Reviewer:

- [ ] Checked if merge conflicts with master exist
- [ ] Checked if stylechecks for Java and Python pass
- [ ] Checked if all docstrings were added and/or updated appropriately
- [ ] Ran spellcheck on docstring
- [ ] Checked if guides & concepts need to be updated
- [ ] Checked if naming conventions for parameters and variables were followed
- [ ] Checked if private methods are properly declared and used
- [ ] Checked if hard-to-understand areas of code are commented
- [ ] Checked if tests are effective
- [ ] Built and deployed changes on dev VM and tested manually
- [x] (Checked if all type annotations were added and/or updated appropriately)

@bubriks bubriks added the WIP This issue or pull request is a work in progress label Feb 6, 2024
@bubriks bubriks removed the WIP This issue or pull request is a work in progress label Feb 7, 2024
@bubriks bubriks requested a review from SirOibaf February 7, 2024 15:47


def get_dataset_type(path: str):
if re.match(r"^(?:dfs://|)/apps/hive/warehouse/*", path):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I really don't understand this regex. A hopsworks path can either start with / or hdfs:// (or hopsfs:// for that matter), but for sure not with dfs:// - So I'm really not sure why we are checking if it begins with dfs://

If I try the regex in a Python interpreter it doesn't match the following string: hdfs:///apps/hive/warehouse/test which is a problem considering that here:

file = "hdfs://" + file

we add hdfs:// at the beginning of the file if the file doesn't contain any scheme.

I also checked a couple of storage connector, the rest API doesn't return any scheme, meaning hdfs:// is added by the line above.

This also means, that the default behaviour of uploading the credentials in the storage connector directory within the Hive warehouse directory is now broken with this PR because the regex doesn't match hdfs://.

When you test this PR, please test both cases.

Copy link
Contributor Author

@bubriks bubriks Feb 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah sorry i am not sure how i lost the h from hdfs and probably didnt catch it because of this if statement: https://github.com/logicalclocks/feature-store-api/blob/master/python/hsfs/engine/python.py#L908 (aka. don't download file if it already exists)

@bubriks bubriks requested a review from SirOibaf February 26, 2024 09:07
@SirOibaf SirOibaf merged commit 48bb5e0 into logicalclocks:master Feb 26, 2024
SirOibaf pushed a commit to SirOibaf/feature-store-api that referenced this pull request Feb 26, 2024
SirOibaf pushed a commit that referenced this pull request Feb 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants