Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

#2456 Make incremental data loading lazy #2497

Closed
wants to merge 3 commits into from

Conversation

PtrBld
Copy link

@PtrBld PtrBld commented Apr 6, 2023

NOTE: Kedro datasets are moving from kedro.extras.datasets to a separate kedro-datasets package in
kedro-plugins repository. Any changes to the dataset implementations
should be done by opening a pull request in that repository.

Description

Resolves #2456

The implementation of the load function for the PartitionedDataSet and IncrementalDataSet do not match for no apparent reason (one is lazy loaded the other is eager loaded). See #2456.

Development notes

This is a breaking change. I have not yet adjusted the release file because I do not know if the implemented behavior is really the wanted behavior. If everything looks good I can prepare the release notes. Another alternative for this issue would be to introduce a boolean eager parameter to the PartitionedDataSet and let the user decide if the dataset should be eager or lazy loaded.

Checklist

  • Read the contributing guidelines
  • Opened this PR as a 'Draft Pull Request' if it is work-in-progress
  • Updated the documentation to reflect the code changes
  • Added a description of this change in the RELEASE.md file
  • Added tests to cover my changes

Signed-off-by: bludau <bludau@fortiss.org>
@noklam noklam added the Community Issue/PR opened by the open-source community label Apr 6, 2023
@PtrBld PtrBld changed the title [Draft] #2456 Make incremental data loading lazy #2456 Make incremental data loading lazy May 2, 2023
@PtrBld
Copy link
Author

PtrBld commented May 2, 2023

Does someone have an opinion on the breaking change problem?

@noklam
Copy link
Contributor

noklam commented May 2, 2023

@PtrBld From what I understand the main difference is that you need to call () when you consume the dataset. I think that's good, I don't see why it should deviated from PartitionedDataSet's behavior. However this will be a breaking change and goes in 0.19.

I will make sure this gets discussed next week with the team, putting it into the backlog discussion.

@SajidAlamQB SajidAlamQB added the Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation label Jun 20, 2023
@jmholzer
Copy link
Contributor

jmholzer commented Jun 21, 2023

Great contribution and we'd love to work on this more over at kedro-plugins. See comments under #2456.

Closing in favour of a new PR in the kedro-plugins repository that addresses:

kedro-org/kedro-plugins#249
kedro-org/kedro-plugins#250

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Stage: Technical Design 🎨 Ticket needs to undergo technical design before implementation
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants