PartitionedDataset
has inconsistent lazy behavior
#4037
Labels
Community
Issue/PR opened by the open-source community
Issue: Bug Report 🐞
Bug that needs to be fixed
support: needs more info
Hi! While doing some experiments I found out this thingy which more than a bug is a nuance. I get that maybe is not so big deal to add a check to fix this in user code but just to make the note on this her you go the bug report 😃🌈
Description
PartitionedDataset
returns acallable
depending on the incoming dataset is read from disk. If it comes from an in memory dataset it has the materialized parts already thus not being lazy and not a callable.This makes that the nodes have to take care of checking if the parts are callable or not.
Context
Sometimes to speed things up I remove intermediate disk writes but this changes the behavior of the PartitionedDataset
Steps to Reproduce
load a partitioned dataset from disk or load it from memory
Expected Result
both should be the same and don't leak implementation into the node taking care of the dataset parts
Actual Result
normally an error like
or the other way around if I'm not expecting a function
Your Environment
The text was updated successfully, but these errors were encountered: