-
Notifications
You must be signed in to change notification settings - Fork 168
Open
Labels
Description
How will we be handling test data in v4 of Parcels? At the moment we download it via
parcels/tools/exampledata_utils.py, but perhaps it would be better to:
- Generate the data (i.e., xarray datasets) on the fly in the idealised cases
- Download the data (from some hosting provider) for realistic cases. Perhaps using
poochrather than our current custom downloading mechanism.This will also mean that we can remove
parcels/data(and any other committed data files) which I think would be good.Let's continue this discussion in an issue :)
Originally posted by @VeckoTheGecko in #1946 (comment)
Okay. Let's flesh this out...
Summary
- For idealised cases: Data can be generated
- For real world cases: Download the data
- For small datasets (<100Mb per file): Host on GitHub via https://github.com/OceanParcels/parcels-data
- For large datasets: We won't have any example datasets in this category. If we need them, can investigate using Zenodo
In all cases the return should be an xarray dataset object (feedback welcome on this point. Is this suitable for unstructured grids - would it be better to return a uxarray dataset? Would there be instances where we want a collection (e.g., list) of xr.Dataset objects?)
TODO
- Create Parcels data repo
- Migrate data downloading to use Pooch
- Update
MovingEddies_datadataset to be generated - Remove
download_example_datasetin favor ofget_example_dataset(which will return an xarray object)- This is best to do down the line once we have support for easily creating Fields etc from xarray objects, and would require updating the tutorials etc.
- Remove datasets from
tests/test_datathat don't need to be there (i.e., they can be generated with code)
Metadata
Metadata
Assignees
Labels
Type
Projects
Status
Backlog
Status
Backlog