Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Support versioning of PartitionedDataset classes #2857

Closed
deepyaman opened this issue Jul 28, 2023 · 3 comments
Closed

Support versioning of PartitionedDataset classes #2857

deepyaman opened this issue Jul 28, 2023 · 3 comments
Labels
Issue: Feature Request New feature or improvement to existing feature Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation TD: implementation Tech Design topic on implementation of the issue

Comments

@deepyaman
Copy link
Member

deepyaman commented Jul 28, 2023

Description

Related to #521

Why is this important?
PartitionedDataSet is a very useful dataset, especially for advance use cases. It is particular important for advance use case, such as optimising performance with lazy-loading and lazy-saving. Recently we introduce #2161 as well. However, it doesn't support the versioned flag and user cannot version their data when they use this dataset. #521 is the last attempt.

Read more about ParittionedDataSet: #521

Context

Extra Context with previous discussion

If you are not familiar with IncrementalDataSet or PartitionedDataSet, we have a https://github.com/kedro-org/kedro/milestone/40 and this is a great start! It’s one of the wrapper dataset that we created few years ago and I think this is a powerful way to extend Kedro’s datasets. We also need to improve our docs as I think these wrapper datasets should be used more often.
If you are not familiar with lazy loading, that’s great! Read our own doc https://docs.kedro.org/en/stable/data/kedro_io.html#partitioned-dataset and see if it makes sense! If not then we will have more idea about how to improve this doc.
DataEngineeringOne tutorial of IncrementalDataSet - https://www.youtube.com/watch?v=v7JSSiYgqpg
IncrementalDataSet’s docs - https://docs.kedro.org/en/stable/kedro.io.IncrementalDataSet.html

Possible Implementation

#521

Possible Alternatives

@deepyaman deepyaman added Issue: Feature Request New feature or improvement to existing feature TD: implementation Tech Design topic on implementation of the issue Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation labels Jul 28, 2023
@noklam
Copy link
Contributor

noklam commented Aug 3, 2023

Excited to add versioning to PartitionedDataset! Thanks @deepyaman

2023-08-02 Technical Design
cross-post @deepyaman comments

@merelcht
Copy link
Member

Completed in kedro-org/kedro-plugins#447

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Issue: Feature Request New feature or improvement to existing feature Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation TD: implementation Tech Design topic on implementation of the issue
Projects
Archived in project
Development

No branches or pull requests

3 participants