Skip to content

Conversation

@fluidnumerics-joe
Copy link
Contributor

@fluidnumerics-joe fluidnumerics-joe commented Mar 31, 2025

Ultimately, yes, these will replace Field and Fieldset. For the time
being, because this is a massive overhaul, keeping xfield and xfieldset
in their own files is a small step towards leveraging xarray and uxarray
natively under the hood.
Non-working draft of structured grid index search has been brought in as
well. This needs a bit of work.
This is done since the Field object holds either an xr.DataArray or
ux.UxDataArray which contain the grid, dimensions, and coordinates. With
this, the time argument is now set to be a datetime object.
The fieldset.dimrange function is meant to be a replacement for
fieldset.grid.dimrange. Functionally, it operates in the same way but
relies on the fields underneath the (u)xarray.(Ux)dataset. Currently,
this is incompatible with a fieldset constructed using the
`fieldset.add_field` function, since add_field does not update the `ds`.
Need to think on how we want to track this.

The `fieldset.gridset_size` property is set to be equivalent to the
number of fields (dataarrays) stored under the fieldset. This property
is meant to replace the `fieldset.gridset.size` property.
This commit allows us to define a fieldset using a list of
UXarray.UxDatasets and Xarray.Datasets. Currently, because
`fieldset.add_field` is used to add pointers to each field, the field
names between the variables in each list item must be unique
This limits the changes required in advection.py. Still need to check
that the VectorField.igrid value is set appropriately.
Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we have the tests folder as tests/v4 rather than v4-tests . At the moment the tests aren't being collected (hence aren't run in CI)

pytest --collect-only | grep v4-tests

@github-project-automation github-project-automation bot moved this from Backlog to Ready in Parcels development Apr 1, 2025
@VeckoTheGecko
Copy link
Contributor

(also, would you be able to run pre-commit run --all-files so that the hooks are run? I think there is some drift from our QAQC tools in this PR)

@fluidnumerics-joe fluidnumerics-joe marked this pull request as ready for review April 9, 2025 15:23
@fluidnumerics-joe
Copy link
Contributor Author

@VeckoTheGecko - This is ready for review and for fixing the test data ingestion for the tests/v4

@VeckoTheGecko
Copy link
Contributor

fixing the test data ingestion for the tests/v4

Done

Can do a full review tomorrow

Also move the data to a pytest fixture :)
@VeckoTheGecko VeckoTheGecko force-pushed the feature/uxarray_xarray_fields branch from d3a26ae to 4bf5096 Compare April 9, 2025 16:18
Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just dropping a partial review now since a lot of thoughts have come up. Some small review comments for the timebeing.

I think we need to answer an important question moving forward:

How are lat,lon,depth handled? Previously these were numpy arrays anywhere from dimension 1-4 depending on the geometry of the grid itself. This became a maintenance burden since every part of the code that needed these values also needed to include logic to deal with it for each different type of grid. With the addition of unstructured grids if we go the same route it will just further complicate things. IMO we need to find a good abstraction that we can use to bridge the gap between datasets and interpolation, so that we can tuck this complexity into a module and provide a clean API to the rest of the codebase (I can start investigating what sort of stuff is used internally in xarray and scipy for interpolation - perhaps there is inspiration we can take from those packages). For reference, @fluidnumerics-joe what is the structure of lon,lat,depth in the case of unstructured grids?

I don't know if this PR is the right place to answer this, or whether these questions should block this PR. Another option would be to create new issues to highlight these concerns. Next week I'm also hardly in the office (conference for 2 days, and then shortened week due to easter). @fluidnumerics-joe I assume this might be a blocker on your end? Not sure what other development you're working on.

parcels/field.py Outdated
interp_method: InterpMethod = "linear",
name: str,
data: xr.DataArray | ux.UxDataArray,
grid: ux.Grid | None = None, # To do : Once parcels.Grid class is added, allow for it to be passed here
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We've been using # TODO:, # TODO v4:, # TODO Nick:... across the codebase for todo comments. I'd be keen to stick to that for searchability (i.e., all caps TODO), wouldn't want this to slip through :)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it. I'll make these changes

@fluidnumerics-joe
Copy link
Contributor Author

Just dropping a partial review now since a lot of thoughts have come up. Some small review comments for the timebeing.

I think we need to answer an important question moving forward:

How are lat,lon,depth handled? Previously these were numpy arrays anywhere from dimension 1-4 depending on the geometry of the grid itself. This became a maintenance burden since every part of the code that needed these values also needed to include logic to deal with it for each different type of grid. With the addition of unstructured grids if we go the same route it will just further complicate things. IMO we need to find a good abstraction that we can use to bridge the gap between datasets and interpolation, so that we can tuck this complexity into a module and provide a clean API to the rest of the codebase (I can start investigating what sort of stuff is used internally in xarray and scipy for interpolation - perhaps there is inspiration we can take from those packages). For reference, @fluidnumerics-joe what is the structure of lon,lat,depth in the case of unstructured grids?

In an unstructured grid, we think about a mesh as a collection of vertices (often called nodes). Elements are formed by creating convex shapes by joining sets of nodes; for FESOM and ICON, elements are just extruded triangular cells. In FESOM and ICON, the mesh is unstructured in 2-D (lat,lon) but extruded in the depth direction. Because of this the unstructured mesh description is 2-D; the third depth dimension is handled similarly to structured grid models.

In this context, the top and bottom faces of each element is related to three vertices (where the vertices are defined by lat and lon only). Edges are just lines that connect two vertices; you can also relate an element to the edges that make up its boundary. Depending on the discretization, a field can be located on a face center, edge center, or on a node.

For a uxarray.UxDataArray or uxarray.UxDataset there are the following coordinates

  • edge_lat, edge_lon dimensioned (n_edge) - These give the latitude and longitude at edge centers
  • face_lat, face_lon dimensioned (n_face) - These give the latitude and longitude at face centers
  • node_lat, node_lon dimensioned (n_node) - These give the latitude and longitude on nodes/vertices
  • nz1 dimensioned (nz1) - This is a depth coordinate registered to the (3-D) element center vertically
  • nz dimensioned (nz) - This is a depth coordinate registered to the (3-D) element faces vertically

The uxarray.UxDataArray.uxgrid or uxarray.UxDataset.uxgrid attribute provides the connectivity tables that relate

  • Edges to nodes - edge_node_connectivity
  • Faces to edges - face_edge_connectivity
  • Faces to nodes - face_node_connectivity
  • Faces to faces (useful for neighbor lookups) - face_face_connectivity

There's no hope of having a single memory layout that would efficiently be used for rectinlinear structured, curvilinear structured, and unstructured lat, lon, and depth. In the end, the primary workhorse of the Field class is the eval method; under the hood this primarily houses a search method to find the nearest grid cell for local interpolation and then the interpolation method. By having the interp_method as a callable, we can maintain a suite of built-in interpolators that can be used for the various mesh types we plan to support. Although the search methods are not a callable attribute of the Field class, it's apparent we need a suite of search methods appropriate for each mesh type as well.

Keeping these bits underneath the call stack of the Field.eval method hides the complexity of evaluating a field at a point in space and time from everything in Fieldset and higher. A big difference between us and xarray and scipy is that we are trying to expose the ability to also provide your own interpolator. At this level, the end user needs to be keenly aware of the data available for the field they plan to tie their interpolator to. We can't be expected to hide that (nor should we).

I don't know if this PR is the right place to answer this, or whether these questions should block this PR. Another option would be to create new issues to highlight these concerns. Next week I'm also hardly in the office (conference for 2 days, and then shortened week due to easter). @fluidnumerics-joe I assume this might be a blocker on your end? Not sure what other development you're working on.

Not really a blocker right now. I have built-in interpolators for face registered and node registered data, which is sufficient for a FESOM and ICON mesh. The Field.eval method currently leverages a spatial hash search or nearest neighbor search (depending on the presence of an optional ei input). In terms of letting this block the PR, I'd push for getting this into v4-dev sooner so that we can all work from another common point with this level of xarray/uxarray integration.

@VeckoTheGecko
Copy link
Contributor

Thank you for the in depth response here @fluidnumerics-joe and for the overview of how unstructured grids are handled here!

A big difference between us and xarray and scipy is that we are trying to expose the ability to also provide your own interpolator.

Yes. I have seen quite some stalled discussions in the xarray repo about custom interpolation.


Ok, going to give everything a once over now before approving just to clarify some things and get familiar.

Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Managed to look through in full now

Copy link
Contributor

@VeckoTheGecko VeckoTheGecko left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Happy to merge when you're ready @fluidnumerics-joe :)

@fluidnumerics-joe fluidnumerics-joe merged commit 4dd71fe into v4-dev Apr 16, 2025
1 of 12 checks passed
@fluidnumerics-joe fluidnumerics-joe deleted the feature/uxarray_xarray_fields branch April 16, 2025 13:20
@github-project-automation github-project-automation bot moved this from Ready to Done in Parcels development Apr 16, 2025
@fluidnumerics-joe fluidnumerics-joe mentioned this pull request Apr 22, 2025
@VeckoTheGecko VeckoTheGecko mentioned this pull request Apr 28, 2025
3 tasks
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Status: Done

Development

Successfully merging this pull request may close these issues.

4 participants