Add array or regex to data #23

mmaelicke · 2024-08-20T08:04:41Z

In the data section, there might be a case, in which not only a single file is associated to an input dataset, but a list of files.
In these cases we can either:

allow an array similar to the parameters
allow regular expressions as an attribute inside the data section

In both cases, the specs cannot handle cases in which the number of datasets is arbitrary. In these cases, the developer has to fall back to specify a directory in the parameters instead of data section.

An example for the multi-files case: A tool takes a netCDF, which is chunked into many files

An example for the multi-dataset (multi-files): An aggregator or viewer tool takes a folder as input, that contains data folders. Similar to what the data loader creates.
I would argue, that this is an edge case and usually tools can specify the data they need.

I am in preference of setting ie. a multi=True flag on a data spec, which effectively allows wildcards in the path

@Ash-Manoj @AlexDo1 do you have any comments on this? I am not entirely sure how to do that and comments are welcome

The text was updated successfully, but these errors were encountered:

AlexDo1 · 2024-08-20T09:13:12Z

Hm, good question.

I like the multi flag, as this also quite clearly states that there can be more than one data file. Just always allowing wildcards could be confusing, as it would not be clear via the specification if multiple data files are allowed.

At the moment I'm also in favor of allowing wildcards then, as this allows to be stricter in defining the file names (e.g. in/precipitation/preciptitation_*.nc for in/precipitation/preciptitation_2011.nc, in/precipitation/preciptitation_2012.nc, in/precipitation/preciptitation_2013.nc.
But the wildcard also would allow to just take everything inside a folder as input data, even when the file names are not that structured, e.g. in/data/* for in/data/air_temperature.nc, in/data/discharge.csv, in/data/catchment.geojson (would probably be bad implementation to have that as input data, but I think it demonstrates what I mean).

So I like the flexibility of the wildcard together with the clarity of the multi flag.

Ash-Manoj · 2024-08-20T10:07:08Z

I also like the flag idea. We could test this on the catflow generator tool where I think multiple tiff files have to be read in as input for the tool.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add array or regex to data #23

Add array or regex to data #23

mmaelicke commented Aug 20, 2024

AlexDo1 commented Aug 20, 2024

Ash-Manoj commented Aug 20, 2024

Add array or regex to data #23

Add array or regex to data #23

Comments

mmaelicke commented Aug 20, 2024

AlexDo1 commented Aug 20, 2024

Ash-Manoj commented Aug 20, 2024