Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

PartitionedDataset has inconsistent lazy behavior #4037

Open
BielStela opened this issue Jul 26, 2024 · 1 comment
Open

PartitionedDataset has inconsistent lazy behavior #4037

BielStela opened this issue Jul 26, 2024 · 1 comment
Labels
Community Issue/PR opened by the open-source community Issue: Bug Report 🐞 Bug that needs to be fixed support: needs more info

Comments

@BielStela
Copy link

BielStela commented Jul 26, 2024

Hi! While doing some experiments I found out this thingy which more than a bug is a nuance. I get that maybe is not so big deal to add a check to fix this in user code but just to make the note on this her you go the bug report 😃🌈

Description

PartitionedDataset returns a callable depending on the incoming dataset is read from disk. If it comes from an in memory dataset it has the materialized parts already thus not being lazy and not a callable.
This makes that the nodes have to take care of checking if the parts are callable or not.

Context

Sometimes to speed things up I remove intermediate disk writes but this changes the behavior of the PartitionedDataset

Steps to Reproduce

load a partitioned dataset from disk or load it from memory

Expected Result

both should be the same and don't leak implementation into the node taking care of the dataset parts

Actual Result

normally an error like

object is not callable

or the other way around if I'm not expecting a function

'function' object has no attribute 'x'

Your Environment

  • kedro, version 0.19.6
  • Python 3.10.12
  • Linux 6.5.0-44-generic~22.04.1-Ubuntu
@merelcht
Copy link
Member

merelcht commented Aug 27, 2024

@BielStela, thanks for flagging this and apologies for the delay in response. Can you check which kedro-datasets version you are using?

@merelcht merelcht added support: needs more info Community Issue/PR opened by the open-source community labels Nov 26, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Community Issue/PR opened by the open-source community Issue: Bug Report 🐞 Bug that needs to be fixed support: needs more info
Projects
None yet
Development

No branches or pull requests

3 participants