Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

I was not able to use the PartitionedDataSet as the AzureMLPipelineDataset #94

Open
snavyareddy opened this issue Mar 25, 2024 · 1 comment

Comments

@snavyareddy
Copy link

In my usecase, one of the node returns a PartitionedDataSet that dataset need to be accessed by the following node. But I was able use AzureMLPipelineDataset with single dataframe but not PartitionedDataSet.
As I want to use it for the parallel_processing...

downloaded_station_data:
type: PartitionedDataset
path: data/01_raw/downloaded_station_data
dataset: pandas.CSVDataset
filename_suffix: .csv

I tried this way:
downloaded_station_data:
type: kedro_azureml.datasets.AzureMLPipelineDataset
dataset:
type: PartitionedDataset
path: data/01_raw/downloaded_station_data
dataset: pandas.CSVDataset
filename_suffix: .csv

This need to be used as the azureml pipeline data using AzureMLPipelineDataset.
I was getting the error as [DatasetError: filepath]
image

If anyone has the solution please help me...

@AlexandreOuellet
Copy link

AlexandreOuellet commented Oct 16, 2024

We've had something similar. You need to use the filepath_arg actually. We ended up with something like this :

test_folder:
  type: kedro_azureml.datasets.AzureMLAssetDataset
  azureml_dataset: redacted
  root_dir: data/01_raw/
  filepath_arg: "path"
  dataset:
    type: PartitionedDataSet
    filename_suffix: ".csv"
    versioned: false
    path: "./"
    dataset:
        type: pandas.CSVDataset

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants