Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Partition Dataset Overwrite not working as expected #601

Closed
fazilhero opened this issue Aug 2, 2023 · 4 comments
Closed

Partition Dataset Overwrite not working as expected #601

fazilhero opened this issue Aug 2, 2023 · 4 comments
Labels
Community Issue/PR opened by the open-source community

Comments

@fazilhero
Copy link

Description

Im trying to use the PartitionedDataset with overwrite parameter set to True but it overwrites a completely different partition.

Context

I have the following partitions in my file storage(s3):

"2023-08-01/se/1/orders"
"2023-08-01/se/3/orders"
"2023-08-01/se/2/orders"

When I have a function which process a single partition like "2023-08-01/se/1/orders" and tries to save this back
with overwrite set to True, it removes all the other partitions.

Suspected error is here with recursive param I believe: https://github.com/kedro-org/kedro/blob/main/kedro/io/partitioned_dataset.py#L308

Steps to Reproduce

Expected Result

Before Save:
"2023-08-01/se/1/orders"
"2023-08-01/se/3/orders"
"2023-08-01/se/2/orders"

After Save:
"2023-08-01/se/1/orders"
"2023-08-01/se/3/orders"
"2023-08-01/se/2/orders"

Actual Result

Before Save:
"2023-08-01/se/1/orders"
"2023-08-01/se/3/orders"
"2023-08-01/se/2/orders"

After Save:
"2023-08-01/se/1/orders"

  • Kedro version used (pip show kedro or kedro -V): 0.18.11
  • Python version used (python -V): 3.10
  • Operating system and version: Ubuntu 18.04
@noklam
Copy link
Contributor

noklam commented Aug 3, 2023

@fazilhero This is the expected behavior - https://docs.kedro.org/en/stable/kedro.io.PartitionedDataset.html#kedro.io.PartitionedDataset

overwrite – If True, any existing partitions will be removed.

Could you elaborate your use case in terms of what you are trying to do? We just have a discussion about kedro-org/kedro#2857 to support versioning of PartitionedDataset, are you trying to overwrite partitions partially, or is that true that versioning of PartitionedDataset is actually what you want?

@fazilhero
Copy link
Author

What I understood is that when i write a partition "2023-08-01/se/1/orders", it should either overwrite that partition or throw error. I was a bit confused when it was deleting different partitions such as "2023-08-01/se/2/orders" I now realize doc says any partition, I suppose you really wanted to delete all partitions.

@noklam
Copy link
Contributor

noklam commented Aug 3, 2023

Sorry for the confusion, I think what you make sense.
@stichbury

@merelcht merelcht transferred this issue from kedro-org/kedro Mar 11, 2024
@merelcht merelcht added the Community Issue/PR opened by the open-source community label Mar 11, 2024
@noklam
Copy link
Contributor

noklam commented Jul 16, 2024

Closed as this is documented expected behavior. If this is a desired feature feel free to open a separate issue for feature request or submit a PR

@noklam noklam closed this as completed Jul 16, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community
Projects
Archived in project
Development

No branches or pull requests

3 participants