-
Notifications
You must be signed in to change notification settings - Fork 291
Proof of concept: MeshIndexSet #6014
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Proof of concept: MeshIndexSet #6014
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #6014 +/- ##
==========================================
- Coverage 89.75% 89.35% -0.41%
==========================================
Files 90 90
Lines 22929 23102 +173
Branches 5020 5062 +42
==========================================
+ Hits 20580 20642 +62
- Misses 1618 1729 +111
Partials 731 731 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wanted to get my head into this space as I think it's a valuable and exciting addition to the API. Here are some interim comments. Ideally I could propose some actual code changes via a PR soon but I can't guarantee it.
lib/iris/experimental/ugrid/mesh.py
Outdated
return self.indices | ||
elif self.location == "edge": | ||
connectivity = self.mesh.edge_node_connectivity[self.indices] | ||
node_set = list(set(connectivity.indices.compressed())) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
compressed()
is only available for masked arrays. We currently allow non-masked arrays too. This is a similar problem to SciTools/iris-esmf-regrid#323.
lib/iris/experimental/ugrid/mesh.py
Outdated
if self.location == "node": | ||
return self.indices | ||
elif self.location == "edge": | ||
connectivity = self.mesh.edge_node_connectivity[self.indices] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What you're doing here is completely valid, but it might take some getting used to - need more developer education to avoid people like me getting confused!
A Connectivity
also has indices
. I'm used to those exclusively being used to index node_coordinates
- i.e. the Nodes, so I found it confusing that here they are instead indexing Edges, and lower down indexing Faces.
HOWEVER: that is only because we always work with edge_node_
and face_node_connectivity
. If we were more used to working with other examples such as face_face_connectivity
then there would be no confusion.
We might consider some different variable naming but that's probably overdoing it IMO.
lib/iris/experimental/ugrid/mesh.py
Outdated
connectivity_indices = np.vectorize(self.node_index_dict.get)( | ||
connectivity.indices | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I bow to your wisdom on matters like this, but I would assume it would be more efficient to store a 1D array of integers, where position in the array is equivalent to new_index
and the value is equivalent to old_index
? Pretty sure there would be a way to use that in the generation of connectivity_indices
, although it's not immediately obvious to me what that would be.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Perhaps your existing approach is easier to get working lazily as well?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So I think the problem with storing this as an array where the one set of indices is infered by the position is that the indices aren't guaranteed to be of the form [0,1,2,3,...]
but may have gaps in them. It ought to be possible to rewrite this to be of the form connectivity_indices = index_array[connectivity.indices]
, but this would require building index_array
to be about as large as the coordinates on the original mesh rather than scaling with the size of the provided indices. It's probably still worth checking how well this performs, because this may well be the more natural way to use numpy.
Some suggestions for SciTools#6014
🚀 Pull Request
Description
Implements a possible way to subset a Mesh while retaining a reference to the original Mesh. A
MeshIndexSet
would be effectively an implementation of the UGRID concept of a location index set. This particular implementation has the benefit of appearing like a regular mesh for most opperations and is capable of functioning with iris-esmf-regrid, for example. The benefit of such a set over a strictly smaller Mesh is that it opens the option for introducing sophisticated functions which can "stitch" a collection ofMeshIndexSet
s back together into their original Mesh. One use case which this may be useful in is when calling expensive functions, like in iris-esmf-regrid, which a full Mesh would be too large to process as a whole so it must be processed as a series of submeshes. It may well be that such cases are specific enough that it is better to solve each on their own terms, but if such problems are general enough, there may be a case to introduce a class like this one.