Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CLI Command 'kedro catalog resolve' fails on dataset factories that use PartitionedDataset #3560

Closed
MosaicMan opened this issue Jan 27, 2024 · 3 comments
Labels
Community Issue/PR opened by the open-source community Issue: Bug Report 🐞 Bug that needs to be fixed

Comments

@MosaicMan
Copy link
Contributor

Description

Running the cli command 'kedro catalog resolve' fails.

Context

The error occurs when trying to use a dataset factory pattern with a PartitionedDataset. It appears that the resolve_patterns method of the kedro.framework.cli.catalog module assumes that every dataset will have a 'filepath' property explicitly defined. That is not true for partitioned datasets.

Steps to Reproduce

  1. Define a dataset factory pattern in your catalog.yml:
"items_{timeframe}":
  type: partitions.PartitionedDataset
  path: /data/models/02_intermediate/items/{timeframe}
  filename_suffix: ".pq"
  overwrite: False
  dataset:
    type: pandas.ParquetDataset
  1. Make it a part of a pipeline:
...
node(
    func=nodes.extract_daily_items,
    inputs='items',
    outputs='items_daily',
    name='extract_daily_items',
),
node(
    func=nodes.extract_weekly_items,
    inputs='items',
    outputs='items_weekly',
    name='extract_weekly_items',
),
...
  1. Run 'kedro catalog resolve'

Expected Result

The different variations produced by the dataset factory should be displayed on screen:

items:
  dataset:
    type: pandas.ParquetDataset
  filename_suffix: .pq
  overwrite: true
  path: /data/models/02_intermediate/items
  type: partitions.PartitionedDataset
items_daily:
  dataset:
    type: pandas.ParquetDataset
  filename_suffix: .pq
  overwrite: false
  path: /data/models/02_intermediate/items/daily
  type: partitions.PartitionedDataset
items_weekly:
  dataset:
    type: pandas.ParquetDataset
  filename_suffix: .pq
  overwrite: false
  path: /data/models/02_intermediate/items/weekly
  type: partitions.PartitionedDataset

Actual Result

The command throws an exception:

Traceback (most recent call last):
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/bin/kedro", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/kedro/framework/cli/cli.py", line 198, in main
    cli_collection()
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/core.py", line 1157, in __call__
    return self.main(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/kedro/framework/cli/cli.py", line 127, in main
    super().main(
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/core.py", line 1078, in main
    rv = self.invoke(ctx)
         ^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/core.py", line 1688, in invoke
    return _process_result(sub_ctx.command.invoke(sub_ctx))
                           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/core.py", line 1434, in invoke
    return ctx.invoke(self.callback, **ctx.params)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/core.py", line 783, in invoke
    return __callback(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/click/decorators.py", line 45, in new_func
    return f(get_current_context().obj, *args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/juancq/.pyenv/versions/3.11.7/envs/draco/lib/python3.11/site-packages/kedro/framework/cli/catalog.py", line 269, in resolve_patterns
    str(context.project_path) + "/", ds_config["filepath"]
                                     ~~~~~~~~~^^^^^^^^^^^^
KeyError: 'filepath'

Your Environment

  • Kedro version used (pip show kedro or kedro -V): 0.19.2
  • Python version used (python -V): 3.11.7
  • Operating system and version: Arch Linux
MosaicMan added a commit to MosaicMan/kedro that referenced this issue Jan 27, 2024
@astrojuanlu astrojuanlu added the Community Issue/PR opened by the open-source community label Jan 27, 2024
@astrojuanlu astrojuanlu added the Issue: Bug Report 🐞 Bug that needs to be fixed label Jan 27, 2024
@astrojuanlu
Copy link
Member

Thanks for the detailed bug report @MosaicMan !

@MosaicMan
Copy link
Contributor Author

Thanks for the detailed bug report @MosaicMan !

Thank you for the awesome videos man.

MosaicMan added a commit to MosaicMan/kedro that referenced this issue Jan 31, 2024
- Remove filepath check from `resolve_patterns` method.
- Eliminate the associated `_trim_filepath` function.
- Update release notes.

These changes address redundant validations that were causing kedro-org#3560.

Signed-off-by: MosaicMan <34198823+MosaicMan@users.noreply.github.com>
ankatiyar added a commit that referenced this issue Feb 2, 2024
* Adding a list of "path" keys to check dataset config against.

Signed-off-by: MosaicMan <34198823+MosaicMan@users.noreply.github.com>

* Clean up `catalog resolve` CLI command to remove unnecessary checks.

- Remove filepath check from `resolve_patterns` method.
- Eliminate the associated `_trim_filepath` function.
- Update release notes.

These changes address redundant validations that were causing #3560.

Signed-off-by: MosaicMan <34198823+MosaicMan@users.noreply.github.com>

* Update release notes

Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>

---------

Signed-off-by: MosaicMan <34198823+MosaicMan@users.noreply.github.com>
Signed-off-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Signed-off-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
Co-authored-by: Ankita Katiyar <110245118+ankatiyar@users.noreply.github.com>
Co-authored-by: Ankita Katiyar <ankitakatiyar2401@gmail.com>
@astrojuanlu
Copy link
Member

Closed in #3561!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Bug Report 🐞 Bug that needs to be fixed
Projects
Archived in project
Development

No branches or pull requests

2 participants