I was not able to use the PartitionedDataSet as the AzureMLPipelineDataset #94

snavyareddy · 2024-03-25T13:25:13Z

In my usecase, one of the node returns a PartitionedDataSet that dataset need to be accessed by the following node. But I was able use AzureMLPipelineDataset with single dataframe but not PartitionedDataSet.
As I want to use it for the parallel_processing...

downloaded_station_data:
type: PartitionedDataset
path: data/01_raw/downloaded_station_data
dataset: pandas.CSVDataset
filename_suffix: .csv

I tried this way:
downloaded_station_data:
type: kedro_azureml.datasets.AzureMLPipelineDataset
dataset:
type: PartitionedDataset
path: data/01_raw/downloaded_station_data
dataset: pandas.CSVDataset
filename_suffix: .csv

This need to be used as the azureml pipeline data using AzureMLPipelineDataset.
I was getting the error as [DatasetError: filepath]

If anyone has the solution please help me...

AlexandreOuellet · 2024-10-16T15:37:54Z

We've had something similar. You need to use the filepath_arg actually. We ended up with something like this :

test_folder:
  type: kedro_azureml.datasets.AzureMLAssetDataset
  azureml_dataset: redacted
  root_dir: data/01_raw/
  filepath_arg: "path"
  dataset:
    type: PartitionedDataSet
    filename_suffix: ".csv"
    versioned: false
    path: "./"
    dataset:
        type: pandas.CSVDataset

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

I was not able to use the PartitionedDataSet as the AzureMLPipelineDataset #94

I was not able to use the PartitionedDataSet as the AzureMLPipelineDataset #94

snavyareddy commented Mar 25, 2024

AlexandreOuellet commented Oct 16, 2024 •

edited

Loading

I was not able to use the PartitionedDataSet as the AzureMLPipelineDataset #94

I was not able to use the PartitionedDataSet as the AzureMLPipelineDataset #94

Comments

snavyareddy commented Mar 25, 2024

AlexandreOuellet commented Oct 16, 2024 • edited Loading

AlexandreOuellet commented Oct 16, 2024 •

edited

Loading