Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Make all paths relative in kedro catalog resolve #3641

Merged
merged 6 commits into from
Feb 26, 2024

Conversation

ankatiyar
Copy link
Contributor

@ankatiyar ankatiyar commented Feb 21, 2024

Description

This bug was my bad! 😅
I'd suggested in #3561 to just remove the trimming of filepaths because we were only checking for filepath key in the catalog config but PartitionedDataset uses the key path which was causing an error.

But the filepaths in context.catalog._dataset_patterns are absolute paths (because of this

def _convert_paths_to_absolute_posix(
) . So when running kedro catalog resolve, the output for factory datasets had absolute paths but explicit datasets were all relative paths

Eg -

"{pattern}":
  type: partitions.PartitionedDataset
  path: data/01_raw
  dataset: pandas.CSVDataset
  metadata:
    path:  data/01_raw/metadata.yml # just to see what happens with nested dicts
  • kedro catalog resolve
y_train: # resolved from factory
  dataset: pandas.CSVDataset
  metadata:
    path: /Users/ankita_katiyar/kedro_projects/space/data/01_raw/metadata.yml
  path: /Users/ankita_katiyar/kedro_projects/space/data/01_raw
  type: partitions.PartitionedDataset

reviews: # explicit dataset entry
  filepath: data/01_raw/reviews.csv
  type: pandas.CSVDataset

Development notes

  • trimming of filepath now works for nested stuff also
  • refactored the tests a little bit, still haven't fully gotten rid of the mocking of the Context but it's testing the trimming .

Developer Certificate of Origin

We need all contributions to comply with the Developer Certificate of Origin (DCO). All commits must be signed off by including a Signed-off-by line in the commit message. See our wiki for guidance.

If your PR is blocked due to unsigned commits, then you must follow the instructions under "Rebase the branch" on the GitHub Checks page for your PR. This will retroactively add the sign-off to all unsigned commits and allow the DCO check to pass.

Checklist

  • Read the contributing guidelines
  • Signed off each commit with a Developer Certificate of Origin (DCO)
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes
  • Checked if this change will affect Kedro-Viz, and if so, communicated that with the Viz team

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@ankatiyar ankatiyar marked this pull request as ready for review February 22, 2024 11:27
@ankatiyar ankatiyar self-assigned this Feb 22, 2024
Copy link
Contributor

@AhdraMeraliQB AhdraMeraliQB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work with the tests! 🌟

"""Trim the filepaths in the config dictionary, make them relative to the project path."""
conf_keys_with_path = ["filepath", "path", "filename"]
for key, value in config.items():
if isinstance(value, dict):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice recursive case!

Copy link
Contributor

@noklam noklam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually I have the question, do we want to show absolute path or relative path? Resolving to an absolute path doesn't seem wrong to me. Is this consistent to kedro catalog list?

From my understanding, we are basically trying to undo this line of code.

conf_dictionary[conf_key] = _convert_paths_to_absolute_posix(

Would it be easier to actually create the DataCatalog maunally without going context.catalog instead of creating a recursive call to undo this?

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@ankatiyar
Copy link
Contributor Author

@noklam you're right, I've updated it!

@merelcht merelcht mentioned this pull request Feb 26, 2024
4 tasks
Copy link
Member

@merelcht merelcht left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Tested and it works well! 👍

@ankatiyar ankatiyar enabled auto-merge (squash) February 26, 2024 16:54
@ankatiyar ankatiyar merged commit 7a0d7d0 into main Feb 26, 2024
28 checks passed
@ankatiyar ankatiyar deleted the fix/relative-paths-catalog-resolve branch February 26, 2024 17:13
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants