Supporting various circulation models and Parcels' internal representation of data

*This issue is a mix of discussion on internal representation of data, and tasks required for supporting reading data from various circulation models.*

### Internal representation of data

We've decided on working internally with Xarray datasets, now an important matter to discuss is what these datasets actually look like internally[^3] (i.e., dimension ordering, how is depth defined, ...). (1) Are we working with the datasets in the form close to the model output? (i.e., close to the raw NetCDF files?) Or (2) are we working with datasets that match a certain internal representation defined by us?

@erikvansebille  I recall from our group meetings that you were leaning towards (1), saying something along the lines of "allowing users to write interpolator methods while using the knowledge about their model" (please, if I don't fully understand your viewpoint or am missing something, comment below).

I am personally leaning towards (2) for the following reasons:
- Its safer
    - At the point of intialisation we can throw an informative error message if the dataset doesn't match our internal representation. This is easier to debug than runtime errors from failed index searching.
- It makes our code simpler and more performant
    - All our indexing methods can make assumptions according to our internal model (e.g., that the depth array is increasing)
    - We need to do less checking before we do something
- Makes testing easier as from FieldSet all the way through, we only need to test things in our internal data model (e.g., no need to test some edge case introduced only when model X is running `pset.execute()`)
- It's all lazy anyway
    - Any dataset transformation done to get to this "internal state" would be done lazily via Dask
- I feel that we can define an internal representation within parcels that encompasses all models that we want to support. *Admittedly, I'm not 100% sure on whether my feeling here is well informed - I'm not intimately familiar with the model outputs we work with.*
- We already have some assumptions about internal representation with assuming that the data is ordered [tdim, zdim, ydim, xdim]
- Allows interpolation methods to be portable between Fields (and allows us to ship them in Parcels)

A downside of (2) is (as you point out) that those writing interpolation methods need to understand how data is handled internally in Parcels as opposed to their model - that is something we would need good documentation for to support those writing interpolation methods.

### Tasks required for supporting reading data from various circulation models

Sub-issues:
- [ ] #2004
- [x] (2) Decide on an internal representation of data within parcels (or decide on how we are going to define search methods in a flexible way to work with different datasets)
    - [DOC] Document this internal representation in the v4 docs (useful for those writing interpolators etc.)
    - [^2] Define helper functions (in module `parcels._datasets.(un)structured.coerce`) to transform datasets from the original representation (i.e., `circulation_model.py`) to the parcels internal representation. These coersian functions can either be used as documentation for users (to see which transformations they need to do to their data to bring in line with Parcels), or be used in Public convenience methods such as `FieldSet.from_...()` which handle this automatically.
- [x] (3) What do indices correspond to?
  - Is it the f-points, or the t-points?
  - how is VectorField indices done?
  - EDIT: Have since decided that its wrt. the f-points
- [x] (4) Define further helper functions
    - e.g., `Field.from_cf_compliant(ds)` which takes in a dataset with CF-compliant metadata and does all the transformations needed to bring it into parcels internal assumptions
    
    
    
[^3]: By here "internally" I mean at the point where the dataset is passed to the `Field` initialiser, and is stored on the `data` attribute (or similarly, passed to the `Grid` initialiser). This is the structure that the rest of Parcels can safely assume. From a user POV they don't necessarily need to do the data transformations themselves, this can be bridged using classmethods.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Supporting various circulation models and Parcels' internal representation of data #2003

Internal representation of data

Tasks required for supporting reading data from various circulation models

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Supporting various circulation models and Parcels' internal representation of data #2003

Description

Internal representation of data

Tasks required for supporting reading data from various circulation models

Footnotes

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions