Location / Identification for grid specs outside of a file #357

ChrisBarker-NOAA · 2022-03-10T18:58:31Z

As we all know, CF data (or model results, anyway) are often associated with a particular grid specification.

This could be a simple rectangular grid, or a more complex grid, such as those defined by the UGRID and SGRID specs.

It's a common practice to store the grid definition in the same file as the data itself, which works out fine. But in some cases, the grid specification may be substantial, and so it can be stored in a separate file, so it doesn't need to be repeated.

In that case, there needs to be some way to find that other file, and, ideally, determine that it is, indeed, the correct grid. Various folks are doing this already, but not in a standard or robust way -- so It would be nice to have an standard way to do that in CF.

I'm pretty sure I recall some previous discussion about this, but was not able to find it -- thus the new issue.

But please feel free to redirect this discussion to an existing issue if there is one.

I bring this up now because there was a proposal on the UGRID spec site:

ugrid-conventions/ugrid-conventions#59

But it would really be nicer to have a way to do that for any grid type in CF itself.

I refer you to that discussion for a more fleshed-out idea. If the UGRID community wants to take this up, I recommend we move the discussion from the UGRID repo to here.

balaji-gfdl · 2022-03-10T20:17:15Z

https://cfconventions.org/Data/Trac-tickets/145 V. Balaji Office: +1-609-452-6516 Advanced Computing Projects Mobile: +1-917-273-9824 CIMES/GFDL Email: ***@***.***://www.gfdl.noaa.gov/v-balaji-homepage

…

On Thu, Mar 10, 2022 at 1:58 PM Chris Barker ***@***.***> wrote: As we all know, CF data (or model results, anyway) are often associated with a particular grid specification. This could be a simple rectangular grid, or a more complex grid, such as those defined by the UGRID and SGRID specs. It's a common practice to store the grid definition in the same file as the data itself, which works out fine. But in some cases, the grid specification may be substantial, and so it can be stored in a separate file, so it doesn't need to be repeated. In that case, there needs to be some way to find that other file, and, ideally, determine that it is, indeed, the correct grid. Various folks are doing this already, but not in a standard or robust way -- so It would be nice to have an standard way to do that in CF. I'm pretty sure I recall some previous discussion about this, but was not able to find it -- thus the new issue. But please feel free to redirect this discussion to an existing issue if there is one. I bring this up now because there was a proposal on the UGRID spec site: ugrid-conventions/ugrid-conventions#59 <ugrid-conventions/ugrid-conventions#59> But it would really be nicer to have a way to do that for any grid type in CF itself. I refer you to that discussion for a more fleshed-out idea. If the UGRID community wants to take this up, I recommend we move the discussion from the UGRID repo to here. — Reply to this email directly, view it on GitHub <#357>, or unsubscribe <https://github.com/notifications/unsubscribe-auth/ABQJZVGGJF4YVCI5PVHVBF3U7JA6LANCNFSM5QNQN6QQ> . Triage notifications on the go with GitHub Mobile for iOS <https://apps.apple.com/app/apple-store/id1477376905?ct=notification-email&mt=8&pt=524675> or Android <https://play.google.com/store/apps/details?id=com.github.android&referrer=utm_campaign%3Dnotification-email%26utm_medium%3Demail%26utm_source%3Dgithub>. You are receiving this because you are subscribed to this thread.Message ID: ***@***.***>

ChrisBarker-NOAA · 2022-03-10T22:17:09Z

THanks -- looking at that issue, it resulted in this in teh current standard:

"""
2.6.3. External Variables
The global external_variables attribute is a blank-separated list of the names of variables which are named by attributes in the file but which are not present in the file. These variables are to be found in other files (called "external files") but CF does not provide conventions for identifying the files concerned. The only attribute for which CF standardises the use of external variables is cell_measures.
"""

Which is clearly similar / related, but doesn't help with the issue at hand - particularly as it is restricted to cell_measures.

IN a sense, this is a broader problem -- we're not talking about refereing to particular variable per-se, but to an entire grid definition -- a concept that may not yet be included in CF, but will if/when UGRID is included.

So maybe we should talk about this in: #153 ?

bnlawrence · 2022-05-31T15:07:54Z

@ChrisBarker-NOAA Has this discussion progressed anywhere else? I see the UGRID ticket is waiting on us (CF). I think it would be entirely proper to make a proposal which is a variant on the external measures option 2.6.3, but which provides a UUID as well as a variable name for the variable in another file.

davidhassell · 2022-06-17T14:51:43Z

Hi All,

Since CF-1.9, we can point to an entire grid definition simply by referring to a domain variable that contains it (https://cfconventions.org/Data/cf-conventions/cf-conventions-1.9/cf-conventions.html#domain-variables). Such a reference is, as a has been pointed out, not currently allowed - but it in principle it need be more complicated than referring to a cell measure variable.

In CF-1.9 we disallowed a data variable to replace it's traditional domain definition (attributes coordinates, grid_mapping, etc.) with a simple reference to a domain variable. This was because there was no actual use case for that on the table (at the time) and we wanted to avoid the burden on software providers. However (there's always one of those!), if it were allowed for domain variables to be external, then I would say that the use case for allowing them to referenced from data variables, omitting the traditional definitions, is strong.

Now, all of this would apply also to a mesh topology variable [1], as it is just a collection of one or more connected domains. The reference required from a data variable would require two parts: the mesh topology variable name (possibly external), and the location on the mesh (e.g. "node").

I have no comment right now on what the reference encoding could/should look like - just that I don't think that there are any more high-level barriers to doing this!

Hope that helps,
David

[1] (https://github.com/cf-convention/cf-conventions/pull/353/files?short_path=3c189ab#diff-3c189abe47ef902923e4a6126a2fe909ed568bcacae933778144094935c0a9d8)

JonathanGregory · 2024-01-08T20:16:40Z

Dear @ChrisBarker-NOAA

Thanks for raising this issue. Things have moved on a bit in the last couple of years. In particular, we have made the link between CF and UGRID in the latest CF release, as you know. How would you like to proceed with this? At the outset you mentioned a use-case from UGRID. Can we address that by permitting some different types of external variable in Sect 2.6.3, as suggested earlier?

Best wishes

Jonathan

davidhassell · 2024-01-09T16:16:07Z

Hello @ChrisBarker-NOAA, @JonathanGregory and All,

I am keen that this is pursued, but only for UGRID mesh topology variables and the variables that they reference (such as connectivity variables). To that end @bnlawrence, @koldunovn and myself have been working on an encoding proposal, and are close to actually proposing it. As @JonathanGregory suggests it could be, our proposal it quite simple and includes an extension of the existing external_variables functionality.

Why only for UGRID? Apart from it being the only use case on the table (as far as I'm aware), and there are complications if it is extended to arbitrary domain definitions. These complications arise from having to think about whether or not a data variable can be allowed to reference a domain variable instead of the usual collection of metadata variables via named attributes (coordinates, cell_measures, etc.). Our proposal is independent of this complication (due to the nature of the UGRID encoding), so a future extension to address it would certainly be possible.

Many thanks,
David

ChrisBarker-NOAA · 2024-05-09T14:41:11Z

Why only for UGRID?

Makes sense, though:

there is also SGRID (https://sgrid.github.io/sgrid/) which may someday make it's way in.
even though "ordinary" grids like rectangular, etc, are not defined by a single variable in the way UGRID is -- it is still the case that the grid variables. can be quite large, and would benefit from being able to be stored separately and referenced.

So: Yes, only for UGRID now, but let's make sure to leave the door open for extending it to other grid types in the future.

davidhassell · 2024-05-17T12:28:41Z

Hi Chris,

there is also SGRID (https://sgrid.github.io/sgrid/) which may someday make it's way in.

Absolutely. If/when SGRID makes it into CF, external variables could be extended to include it. To be discussed at that time.

even though "ordinary" grids like rectangular, etc, are not defined by a single variable in the way UGRID is -- it is still the case that the grid variables. can be quite large, and would benefit from being able to be stored separately and referenced.

Indeed, large "ordinary" grids could well benefit from this. I think that is better dealt with separately because a) no one is actually asking for it right now, and b) even if they were, it will involve a discussion (instigated by me :)) on whether to allow domain variables to be referenced from data variables - this discussion is not relevant to UGRID, so would only hold up the current discussion, which does have an existing use case.

I'm not expressing an opinion either way on ordinary grids - just kicking the can down the road for now!

So: Yes, only for UGRID now, but let's make sure to leave the door open for extending it to other grid types in the future.

Good. Are you happy for me put in PR linked to this issue that I have ready that could implement this?

Thanks,
David

ChrisBarker-NOAA · 2024-05-17T19:14:09Z

Absolutely—go for it.

davidhassell · 2024-05-28T13:10:44Z

Thanks Chris. Here is a proposal ...

Allowing UGRID mesh variables to be external

Moderator

To be decided.

Moderator Status Review [last updated: YYYY-MM-DD]

Requirement Summary

For data defined on UGRID mesh topologies, it is already the case that datasets are being written for which the mesh variables, i.e. the mesh topology variable and all the variables that it references (e.g. the node coordinates), are stored in a different netCDF file to all of the data variable files. This is because the UGRID description of a mesh can be very large in terms of the disk space required to store it, so repeating it in every file can be expensive. For instance, depending on how many of the optional (yet useful) UGRID features are included, the specification of mesh faces can be approximately 6 to 20 times larger than a 1-d data array defined on those faces.

The mechanism for allowing the mesh variables from one file to be associated with the data variable in another file needs to be standardised so that users of the data can correctly interpret such dataset, and generic software applications can be built that can facilitate the recombination of data and metadata.

Cell measures: A bit of history

CF-1.7 introduced the concept of external variables, but only for cell measures (Section 2.6.3. External Variables). The motivation for the cell measures case was to save space, and it was already been done in a non-standardised way prior to CF-1.7. In other words, the pragmatic situation and motivation for allowing cell measures to reside in an external file are the same as for mesh topology variables. The discussion on allowing cell measures to be external (Trac ticket #145):

Acknowledged that ideally external files would not be allowed, i.e. all metadata is always in the same files as the data it describes.
Acknowledged that saving space is a valid use-case, and that petabytes of data already existed (CMIP5), and will continue to be produced (CMIP6), with non-standardised storage of cell measures in external files.
Carefully considered whether or not to allow other types of metadata that define the cell locations to be be external (such as coordinate and auxiliary coordinate variables, grid mapping variables, etc.). It was pointed that the purpose of CF (Section 1.1. Goals) is to define the necessary elements of a netCDF file such that conforming files "contain sufficient metadata that they are self-describing in the sense that ... each value can be located in space and time", and therefore any information that is needed to geolocate the data must go in the file itself. There was no real disagreement with this, and the cell measures use case did not overlap with geolocation, so it was agreed to restrict external variables to cell measures.

Why allow mesh-related variables to be external, but not arbitrary geolocation variables?

Whilst being able to put arbitrary geolocation variables in external files has been talked about for years, there are technical aspects that have not been addressed which do not apply to UGRID meshes (such as whether or not to allow data variables variables to reference domain variables), and it is the UGRID mesh use case that is being actively discussed here. Restricting the proposal to UGRID meshes will not directly impact on any future discussion on a use case for a further generalisation of external variables, so there is no need to complicate the current situation.

Functional summary

We propose extending the attributes for which CF standardises the use of external variables to include mesh, the attribute that references a UGRID mesh topology variable from data, domain and UGRID location index set variables. The variables referenced by the external mesh topology variable (such as the node coordinate variables) must also listed as external variables.

In addition we propose a simple mechanism to better prevent an incorrect external variable being used by mistake. A new optional global attribute will allow an arbitrary string-valued identity to be associated with each external variable. When provided, and the same identity is also given as an attribute of the external variable in the external file, then the identities must match for the external variable to be used.

Technical Proposal Summary

Allowing meshes to be external

Extending the attributes for which CF standardises the use of external variables to include a mesh topology variable will follow the same principle as for external cell measure variables, with additional rules concerning the variables that the mesh topology variables refers to.

The first (and currently only) paragraph of Section 2.6.3 External Variables will be changed to:

An "external variable" is one which is named by an attribute in the file but which is not present in the file. These variables are to be found in other files called "external files". A variable named by an attribute of an external variable must also be an external variable. CF does not provide conventions for identifying external files, but an external variable must be in the same external file as any variables named by its attributes. The only attributes in the file for which CF standardises the use of external variables are cell_measures and mesh. External variables are provided by the global external_variables attribute, which is a blank-separated list of external variable names, none of which are allowed to be coincide with the name of any variable in the file.

Providing identifiers for external variables

The following paragraph will be inserted at the end of Section 5.9 Mesh Topology Variables:

A mesh topology variable referenced by the mesh attribute is not required to be present in the file containing the data, domain or location index set variable. If the mesh topology variable is located in an external file, rather than in the file where it is referenced, it must be listed in the global external_variables attribute of the referencing file (Section 2.6.3). All variables referenced by an external mesh topology variable, such as connectivity and node coordinate variables, must also be listed in the external_variables attribute, and must be located in the same external file as the mesh topology variable.

Providing identifiers for external variables

The following new paragraph will be inserted at the end of Section 2.6.3. External Variables:

A variable listed in the external_variables attribute may optionally be associated with an arbitrary string-valued "identifier" which provides a secondary mechanism, after the variable name and the names of its dimensions, for finding the correct external variable. The global external_variable_identifiers attribute defines the identifiers for any subset of the external variables. This is a string attribute comprising a list of blank-separated pairs of words of the form "variable: identifier". The "variable" is the name of an external variable, and the "identifier" is the variable's identifier. When an identifier has been provided by the external_variable_identifiers attribute, a program application that also has access to the external file should only use the corresponding external variable if it has the same identifier defined by its string-valued variable_identifier attribute. When a variable listed in the external_variables attribute does not have a corresponding external identifier in the external_variable_identifiers attribute, then any external variable with the correct name and dimensions may be used.

Examples

A new example for Section 2.6.3. External Variables:

Example 2.2: A file with external variables

dimensions:
  nFace = 5839 ;
  time = 12 ;
variables:
  double time(time) ;
    time:standard_name = "time" ;
    time:units = "days since 2000-01-01" ;
  double u(time, nFace) ;
    u:standard_name = "eastward_wind" ;
    u:units = "ms-1" ;
    u.cell_measures = "area: cell_areas" ;
    u:mesh = "mesh" ;
    u:location = "face" ;

// global attributes:
    :external_variables = "cell_areas
                           mesh
                       	   node_lon node_lat
                       	   face_nodes edge_nodes face_edges face_links edge_face_links" ;
    // Optional external variable identifiers
    :external_variable_identifiers = "cell_areas: ae45-ed12b mesh: 9ed4-83f46" ;

Example 2.3: The external file containing the external variables for example 2.2

dimensions:
  nNode = 3140 ;
  nEdge = 8986 ;
  nFace = 5839 ;
  n2 = 2 ;
  n3 = 3 ;
variables:
  // Cell measure variable (could be in a different file to the mesh variables)
  double cell_areas(nFace) ;
    cell_areas:variable_identifier = "ae45-ed12b" ;  // Optional variable identifier
    cell_areas:standard_name = "cell_area" ;
    cell_areas:units = "m2" ;

  // Mesh topology variable
  int mesh ;
    mesh:variable_identifier = "9ed4-83f46" ;  // Optional variable identifier
    mesh:cf_role = "mesh_topology" ;
    mesh:long_name = "Topology of 2D unstructured mesh" ;
    mesh:topology_dimension = 2 ;
    mesh:node_coordinates = "node_lon node_lat" ;
    mesh:face_node_connectivity = "face_nodes" ;
    mesh:edge_node_connectivity = "edge_nodes" ;
    mesh:face_edge_connectivity = "face_edges" ;
    mesh:face_face_connectivity = "face_links" ;
    mesh:edge_face_connectivity = "edge_face_links" ;

  // Variables referenced by the mesh topology variable (must be in the
  // same file as the mesh topology variable)
  double node_lon(nNode) ;
  double node_lat(nNode) ;
  int face_nodes(nFace, n3) ;
  int edge_nodes(nEdge, n2) ;
  int face_edges(nFace, n3) ;
  int face_links(nFace, n3) ;
  int edge_face_links(nEdge, n2) ;

Updating appendix A: Attributes

The new attributes external_variable_identifiers and variable_identifier will be included in Appendix A: Attributes, with usage "G" (global) and "CM, U" (new designations for cell measure variables and UGRID variables) respectively.

Benefits

All users of datasets with external meshes would benefit from this proposal.

Status Quo

Users of datasets with external meshes will struggle to read datasets with external meshes.

Associated pull request

Not yet written, but all changes are detailed above.

JonathanGregory · 2024-05-29T08:30:59Z

Dear @davidhassell

Thanks for this proposal, which I think is sound and sensible. I have only one minor comment, namely that I wonder if the attribute variable_identifier could be just identifier. Since it's attached to a variable, it hardly seems necessary to state variable, and moreover there's even a small possibility of confusion, from misunderstanding variable as an adjective. It corresponds to external_variable_identifiers, which I would parse as external_variable + identifiers.

Best wishes

Jonathan

davidhassell · 2024-05-30T10:14:45Z

Thanks for looking at this, @JonathanGregory.

I agree with your suggestion on the name of the external variable identifier attribute (i.e. calling it identifier as opposed to "variable_identifier").

Given this initial support, I shall create an actual PR with the above changes, incorporating Jonathan's suggestion.

davidhassell · 2024-05-30T16:29:22Z

Whilst writing the external variables PR, I came up with some better (?) text for Section 2.6.3, which is, in full:

An "external variable" is one which is named by an attribute in the file but which is not present in the file, or is named by an attribute of another external variable. These variables are to be found in other files called "external files". An external variable must be in the same external file as any variables named by its attributes. The only attributes in the file for which CF standardizes the use of external variables are cell_measures and mesh. External variables are provided by the global external_variables attribute, which is a blank-separated list of external variable names, none of which are allowed to coincide with the name of any variable in the file.

CF does not provide conventions for finding external files, but once an external file for an external variable has been located it is expected to contain a variable with the same name; and the names of the variable's dimensions (if any) are the same as the corresponding dimensions in the parent file, and with the same sizes. An additional check that an external variable is the correct one may be optionally provided via an external variable identifier. The global external_variable_identifiers attribute defines identifiers for any subset of the external variables. This is a string-valued attribute comprising a list of blank-separated pairs of words of the form "variable: identifier", where variable is the name of an external variable, and the identifier is the external variable's identifier. When an identifier has been provided by the external_variable_identifiers attribute, a program application that also has access to the external file should only use the corresponding external variable if it has the same identifier defined by its string-valued identifier attribute. When a variable listed in the external_variables attribute does not have a corresponding identifier in the external_variable_identifiers attribute, then any external variable with the correct name and dimensions may be used. Note that a variable referenced by an attribute of an external variable does not benefit from having an identifier, because it must always be in the same external file as its parent external variable.

ChrisBarker-NOAA · 2024-05-30T16:58:59Z

A couple comments:

An additional check that an external variable is the correct one may be optionally provided via an external variable identifier.

I presume this has been hashed out previously, but are we sure we want the identifier to be optional? That seems like a really good idea, why not require it?

(client code could ignore it of course...)

The other thought in that is whether some guidance should be provided on the the identifier -- ideally it would be likely (or guaranteed to be) unique:

a UUID?
A hash of the variable?
A long and meaningful name

Maybe that's too much to require, but it could be suggested. My concern is that we'll end up with identifiers, like FVCOM-mesh or the like, or even "mesh"

Then any external variable with the correct name and dimensions may be used.

given that the mesh variable is always the same dimensions, and names like "mesh" are likely -- that does reinforce why an identifier is a really good idea!

In practice, for a UGRID, the wrong mesh definition is highly unlikley to be compatible with the other variables, but it would be better to fail early.

-CHB

JonathanGregory · 2024-05-31T07:50:07Z

Dear David

Your text is clear, thanks, except that I think this sentence could be improved

CF does not provide conventions for finding external files, but once an external file for an external variable has been located it is expected to contain a variable with the same name; and the names of the variable's dimensions (if any) are the same as the corresponding dimensions in the parent file, and with the same sizes.

I suggest the following, if it's correct

Users of data must rely on their own means to locate the external files that they need, because CF does not provide conventions for finding external files. The external file supplying an external variable of a given name must contain a variable of that name, whose dimensions (if any) in the external file must have the same names and sizes as the corresponding dimensions in the parent file.

Cheers

Jonathan

JonathanGregory · 2024-05-31T08:04:56Z

Following @ChrisBarker-NOAA's comment, I suggest we could strengthen

An additional check that an external variable is the correct one may be optionally provided via an external variable identifier.

to

To enable an optional additional check that the external variable is the correct one, it is recommended to provide the external variable with an identifier.

We should say here what format this identifier can have. Since they're containing in a blank-separated list, each identifier must be a single word not containing any whitespace. It doesn't have to be intelligible text, but if it's contained in a string I suppose it ought to be consist of printable ASCII characters only i.e. hex 21-7E inclusive. Is that too restrictive? I agree with Chris that we could give some guidance about choosing the identifier, along the lines that to fulfil its purpose it should somehow distinguish among variables that have the right name and shape in the set of files which the user program might consider, so it probably won't help if the identifier is always given the same contents by a given program writing datasets of this kind.

Best wishes

Jonathan

davidhassell · 2024-05-31T08:13:33Z

Hi Chris, thanks for your feedback.

I presume this has been hashed out previously, but are we sure we want the identifier to be optional? That seems like a really good idea, why not require it?

This has not been thrashed out in public before. Off-line @bnlawrence and I have talked about it whilst preparing this, but didn't come to a consensus. I favour optional, but Bryan mandatory. My reasons are for optional are:

External variables have existed for some time for cell measures, so it is preferable for the identifier to me optional because CF prefers that metadata written according to previous versions of the convention should also be compliant with and have the same interpretation under later versions.
CF allows creators to store the data that they want to and generally doesn't prescribe what they should do.

I'm not worried about people not providing identifiers simply because they don't have to - standard_names are (almost) never required, but they are used because people want to, for whatever motivation (they can see that they're useful, or their project insists on them, etc.)

We could certainly recommend that people use an identifier though, then the CF checker will warn in it's absence for files from CF-1.12 onwards.

I hope that Bryan will chime in with his point of view :)

I'm happy leaving the content of an identifier unrestricted (though with @JonathanGregory's caveats) - I can see use cases for a UUID and for a nice name like "CMIP6-HadGEM3-N96-cell-areas".

bnlawrence · 2024-05-31T08:19:52Z

Ah, at last, my chance to air my opinion, and without seeding the conversation first! Thanks @ChrisBarker-NOAA !

I think we should :

a) make it not just a string identifier but a UUID string identifier, and
b) it should be compulsory.

a) would nearly guarantee uniqueness (you couldn’t get in trouble later on because someone copied a different grid into the same path and filename on your machine), and

b) would make sure that it worked as intended - without it, it's basically fragile beyond belief. We get away with areacella because only a tiny fraction of applications need it ...

larsbarring · 2024-05-31T09:11:50Z

From my outside perspective I would side with @bnlawrence and @ChrisBarker-NOAA for exactly the same arguments.

BTW: Currently the CF-Checker handles versions up to and including CF-1.8, so bringing it up to CF-1.12 would require some dedicated time and resources ...

JonathanGregory · 2024-05-31T15:43:20Z

Dear all

I agree with @davidhassell that the new identifier attribute should not be mandatory, for the reasons he states, following CF design principles 9 and 8 (sect 1.2). I think a recommendation to include it is sufficient.

I like Bryan @bnlawrence's idea of using a "universally unique identifier" UUID. For the sake of anyone who, like me, didn't know what this is, wikipedia says

A Universally Unique Identifier (UUID) is a 128-bit label used for information in computer systems. The term Globally Unique Identifier (GUID) is also used, mostly in Microsoft systems.

When generated according to the standard methods, UUIDs are, for practical purposes, unique. Their uniqueness does not depend on a central registration authority or coordination between the parties generating them, unlike most other numbering schemes. While the probability that a UUID will be duplicated is not zero, it is generally considered close enough to zero to be negligible.

Thus, anyone can create a UUID and use it to identify something with near certainty that the identifier does not duplicate one that has already been, or will be, created to identify something else. Information labeled with UUIDs by independent parties can therefore be later combined into a single database or transmitted on the same channel, with a negligible probability of duplication.

Adoption of UUIDs is widespread, with many computing platforms providing support for generating them and for parsing their textual representation.

In that case, the identifier would be a string consisting of hexdigits and a few allowed punctuation marks. Also, we could call the attribute uuid instead of identifier. I agree with David that there is an appeal in having a meaningful text string as an identifier, and that's consistent with CF providing self-explanatory metadata. However, if the identifier is required to be an opaque string, the data-writer could instead make the variable self-explanatory by, for instance, giving it a long_name to say what it's for, or a self-describing variable name. CF attaches no meaning to variable names, but humans can!

Cheers

Jonathan

taylor13 · 2024-05-31T17:05:17Z

I'm only superficially familiar with uuid, but in CMIP we require each file to include a global attribute we call "tracking_id", and it contains a prefix plus a uuid constructed as described here:

tracking_id should be of the form hdl:21.14100/<uuid> (e.g., “hdl:21.14100/02d9e6d5-9467-382e-8f9b-9300a64ac3cd”). The tracking_id should be unique for each file published in ESGF. The "uuid" should be generated using the OSSP utility which supports a number of different DCE 1.1 variant UUID options. For CMIP6, version 4 (random number based) is required. Download the software from http://www.ossp.org/pkg/lib/uuid/.

I'm not sure why the prefix is helpful. I think it has something to do with the DKRZ (Hamburg Germany) method of referencing datasets. Does anyone know its purpose?

JonathanGregory · 2024-05-31T17:26:02Z

Dear all

On further reflection, while walking home, I've changed my mind to agree with @davidhassell that sometimes it would be better to have a meaningful identifier, rather than a UUID. It's a good idea to use a UUID when the external variable could take many different values, such as time-dependent cell metrics. On the other hand, for static information, such as David's example of CMIP6-HadGEM3-N96-cell-areas and the CMIP areacella and areacello variables, an external variable with the same content might be stored in various files, generated at different times, and any of them would be acceptable. It would be obstructive to insist that you used a particular one of them, which might perhaps not even be available to you. That would lead to the convention being ignored in practice.

This consideration suggests we should leave it to the data-writer to decide what's sensible, and give them guidelines. The identifier should not have a generic value like areacello, but if the data-write can choose a meaningful string which they're confident would only be used for the same contents of the external variable, that's fine. If they can't have such confidence, a UUID would be a good choice.

Best wishes

Jonathan

sethmcg · 2024-05-31T20:19:21Z

Following the CMIP example, you could have the best of both worlds by using an attribute with a value like meaningful_identifier/uuid. I agree that it makes sense to leave it up to data-writers to decide what makes the most sense, but perhaps we could recommend that as the default?

bnlawrence · 2024-06-01T08:27:57Z

hi Everyone

I knew that suggesting something would be mandatory would be controversial, but up until now, it has been mandatory to put the coordinate information in the file, indeed our "primary principle" has been "Data should be self-describing, without external tables needed for interpretation." and unlike areacella which is mostly not needed for interpretation, one will not be able to interpret these files without the external information.

The suggestion that unavailability of specific files would cause problems is, I think confusing information with usage. I think the use of a uuid makes it possible for software to be sure that the coordinate file is the correct one - but if you haven't got access to it, software can be advised to use a different one - which puts the onus on the human advisor to get it right.

Given that these files will always be created by software, it will not be onerous to require a UUID at creation time - what one does when confronted by the lack of the appropriate file at usage time should be up to a human.

What this discussion has made me change my mind on is that it would be desirable to have both a UUID and a human readable identifier.

bnlawrence · 2024-06-01T08:29:49Z

(I should say that one might imagine the appropriate coordinate files could be distributed with standard configurations of the code so while there is a risk of many files with the same content and different UUIDs this is a risk that can be mitigated against - and wont be a big risk in practice since the coordinate files should travel with the data. But this is the risk which is most likely to cause me to disagree with myself if someone can convince me.)

ChrisBarker-NOAA · 2024-06-01T16:35:13Z

Hmm:

On the other hand, for static information, such as David's example of CMIP6-HadGEM3-N96-cell-areas and the CMIP areacella and areacello variables, an external variable with the same content might be stored in various files, generated at different times, and any of them would be acceptable.

But IIUC, it's only interchangeable IF it's the same content, in which case it could have the same identifier, yes? the identifier is associated with the variable, not the file, correct?

When you generate a UUID it's unique, but you can make as many copies of it as you like.

bnlawrence · 2024-06-03T07:11:41Z

Agreed. So at data writing time there are two choices:

we know there is an existing unique identifier to the domain of interest - and most likely we actually have that domain coordinate file - so we use that (if we write our own copy using a pre-existing identifier I'd rather like it to be best practice to add metadata to say that has been done), or
we don't know an existing unique identifier, so we write our own to go with our own copy of the domain description, and then write the data with that.

At reading the data time, there are two choices:

We have the domain description file which matches the unique identifier and are good to go, or
We have a domain description file which we believe to be correct, but the unique identifier doesn't match, we may have reason to trust that it is appropriate to use this, so we proceed with caution.

marqh · 2024-06-03T11:06:58Z

many thanks all for raising this issue's profile and exploring opportunities to progress this.
I can see the utility of such an approach.

There is an aspect I am cautious of, which I would like to explore, if I may?

Firstly, context & support:
I believe that the only facet in the current conventions is the the global attribute external_variables.
This appears to provide a blank-separated list of attribute names that are used within the file, but are not able to be referenced within the file. This is important information to provide to file parsing tools, to help them function efefctively.
Extending the use of this attribute to the use case presented here seems completely sensible to me.

However, I am a bit more cautious about the limited scope of the proposed external_variable_identifiers attribute.
This appears to only give one opportunity to provide one string item of identifying information about each external variable.
Yet, there may be multiple useful pieces of information that a data provider could supply that would build robustness into the external referencing, and mitigate the risk that data is provided without access to the essential coordanite information, rendering the data unusable.

The UGRID case is particularly interesting, as well as useful, as we are likely exploring putting numerous external variables within a single file / object to describe the unstructured domain. So, should we be thinking cautiously about how the data consumers need to act in identifiying the object to reference (a netCDF file, a data service endpoint, ...) and the specific variable set within that object, how to obtain that information at data read time, how to check intent, consistency?

Considerations on my mind include:

unique identification of a variable
human readable identification of a variable
unique identification of a file / object / endpoint which contains the variable
whether the variable has the same local name within the referenced file
a location which can be used directly to obtain a file / object / endpoint which contains one or more variables
any grouping of variables that are being externally referenced, e.g. in a single file
checksum of a file that contains a group of variables

(n.b. these are for context, I'm not asking that all of these be mandated; I'm hoping that they are illustrative of this being a multi-faced information requirement)

I am a bit concerned if there's only one item of information one can provide per variable, and the conventions only recommend this, and only tell us it may be a string-name, or a UUID or something else.
I think that there may be an opportunity to build a bit more robustness for this external referencing, without over-engineering the problem. Extensibility may be a useful characteristic here.

Are there options that we can explore, that are still scope constrained, but give opportunity for more than one piece of information to be encoded and provided per variable when describing external variables?

davidhassell · 2024-06-03T13:52:29Z

Hi @bnlawrence, you wrote

At reading the data time, there are two choices:

We have the domain description file which matches the unique identifier and are good to go, or

We have a domain description file which we believe to be correct, but the unique identifier doesn't match, we may have reason to trust that it is appropriate to use this, so we proceed with caution.

This highlights something that hasn't been explicitly addressed, i.e. what to do when an identifier in the parent file doesn't match the identifier (if there is one) in the external file.

I would say that if the identifier is mandatory then it follows that it must be forbidden to use any external variable that doesn't have that identifier, otherwise "mandatory" has no real meaning in this context.

On the other hand, if it were OK to ignore a mandatory identifier when you don't like it, then it follows that the identifier is in fact logically optional, because ignoring it is akin to it not having been provided int he first place.

My text suggested that when an identifier had been provided then it is only recommended for software to only use an external variable with a matching identifier.

This, I think, gives us three choices:

Make identifiers mandatory and forbid using external variables that don't have matching identifiers.
Make identifiers optional and when they are set in the parent file, forbid using external variables that don't have matching identifiers.
Make identifiers optional and when they are set in the parent file, recommend only using external variables that have matching identifiers.

The in the parent phrase in 2. and 3. is important - it allows the creator of the parent file to happily ignore any identifiers in the external files if that suits their purpose.

Given that identifiers not matching seems to be a real use case that we want to be able to deal with nicely, I would suggest that one of the "optional" choices make sense.

bnlawrence · 2024-06-03T14:54:33Z

I don't see how you get from "it is mandatory when writing to" it is mandatory to use that information - that's akin to saying if I put both U and V in the file, you must use both!

bnlawrence · 2024-06-03T14:56:11Z

It comes back to the "CF prime directive" - at the moment it is mandatory to put the domain IN the file (while reading, you could ignore that now if you wanted to). I am suggesting that if you want to break the prime directive it should be mandatory to give users of that data the best possible chance of finding that information.

davidhassell · 2024-06-03T15:30:58Z

I don't see how you get from "it is mandatory when writing to" it is mandatory to use that information - that's akin to saying if I put both U and V in the file, you must use both!

I don't see it quite like that (today) - e.g. you can't ignore standard names, in that you can't treat a variable as eastward_wind if it has a standard name of air_temperature, however convenient that may be for you :-). By which I mean, if metadata (not variables) are provided then you can put minimum expectation on their use, (edit:) if you want to, or set recommendations, etc.

davidhassell · 2024-06-03T15:31:44Z

(I should add that I'm not entrenched here, just finding my way like the rest of us.)

JonathanGregory · 2024-06-03T15:38:08Z

As a data-reader, I would like to comment on a couple of Bryan's points:

areacella is mostly not needed for interpretation. "Mostly" is right, but sometimes it is needed, for instance when calculating a global mean. You can't do it accurately if you don't have the cell areas. In some cases you can guess them, but not for unstructured grids. Hence I do not see a qualitative difference between the situation we presently have, in which only cell measures can be external variables, and the proposed situation, in which a mesh topology could be an external variable. The present situation already allows variables which are sometimes essential not to be in the file. The mesh topology is not always essential.
These files will always be created by software. Sometimes I have created cell measures files manually, when I could not obtain the "official" file (of areacella, areacello etc.) because of some technical nuisance, but I could deduce its contents from information from other sources. If it was mandatory in CF to use a variable with the correct identifier, and cf-python implemented that requirement, I could only use my manually created file by manually writing the correct identifier in it i.e. spoofing the system. I think that having to do this would bring the CF convention into disrepute.

For such reasons I maintain that it should be optional, but recommended, to require the identifier to be present and matching in the external file.

sethmcg · 2024-06-03T15:46:56Z

I don't see how you get from "it is mandatory when writing to" it is mandatory to use that information - that's akin to saying if I put both U and V in the file, you must use both!

I don't see it quite like that (today) - e.g. you can't ignore standard names, in that you can't treat a variable as eastward_wind if it has a standard name of air_temperature, however convenient that may be for you :-). By which I mean, if metadata (not variables) are provided then you can put minimum expectation on their use.

I mean, you absolutely can do that. I've done it. Generally it happens when you're calculating some derived quantity and you don't update the standard names until you're done, so there are intermediate steps with stale metadata.

All CF can do is to say that it's not correct to do that, assuming all of the entities in question have accurate and CF-compliant metadata. (Which they don't, in the case I describe above.)

Likewise, if you have a parent file and an external file that both have identifiers and they don't match, CF can tell you that it's incorrect to use them together (under the assumption that both have accurate and compliant metadata). If either or both files are lacking identifiers, you have no information about whether or not the contents of the files match (and therefore should proceed with caution), and CF can say it's not compliant to produce a pair of files intended to be linked that way without matching identifiers. But I agree with Bryan that it's weird to say it's forbidden to use files that don't have matching identifiers.

davidhassell · 2024-06-03T16:12:40Z

I see that "forbid" was a poor choice of word - sorry! I meant that we can't say that providing an identifier is mandatory without also saying that a dataset is not CF-compliant if the external file doesn't also have a matching identifier. Otherwise I could just further game the system by putting in an arbitrary identifier in the parent file knowing that I never need to abide by it. In this situation CF-compliant software should refuse to entrain an external variable without a matching identifier.

This wasn't an argument against mandatory per se (although I still favour optional), rather an observation about the consequences of making it mandatory.

This is similar to some other of the few cases of mandatory attributes. E.g. the featureType attribute is mandatory for DSGs - if it is missing then in many cases software will fail to correctly parse the dataset - there's no scope for doing something else if it hasn't been provided.

ChrisBarker-NOAA · 2024-06-03T16:53:37Z

I meant that we can't say that providing an identifier is mandatory without also saying that a dataset is not CF-compliant if the external file doesn't also have a matching identifier.

Of course, that is the idea, and true for any other requirement of CF - that's kind of the point :-)

It seems to me that CF is all about the data writer -- THIS is how you make your files properly described by metadata -- and it's guidance for data readers -- This is how you CAN interpret the metadata -- but, of course, we have no say whatsoever over what data readers actually choose to do -- if they want to ignore units, they can ignore units, or standard names, or whatever.

These files will always be created by software. Sometimes I have created cell measures files manually, when I could not obtain the "official" file (of areacella, areacello etc.) because of some technical nuisance, but I could deduce its contents from information from other sources.

Hmm -- it seems to me that in that case, your external file might , (or probably) would be the same -- but not actually guaranteed to be -- maybe only different to the tenth significant figure, or ... mayb eyou used a float32, and the original used a float64, or ....

So I would argue that it is not guaranteed to be the exact same information.

(if we want to, we could use a hash of the data (not the file, as metadata could change) to be a unique id of those exacty variables -- but in the example above, it probably wouldn't match.

If it was mandatory in CF to use a variable with the correct identifier, and cf-python implemented that requirement, I could only use my manually created file by manually writing the correct identifier in it i.e. spoofing the system.

I don't think so -- if you are creating that external file, then you create the identifier, and you point to it -- done. It may well be that there are other external files out there with the same information, but with different identifiers, but I think that's good, not bad.

I guess I'm missing something -- I can see two cases:

You have access to the external file when you create your file -- in which case you use the identifier it already has.
You don't have access to the external file, so you create it yourself. In which case you assign an identifier yourself, and use that. Which seems to me to the right thing to do.

How does leaving out the identifier help anyone here?

I think we have some consensus here:

An identifier should be likely to be unique, either:
a) a UUID or the like
b) A nice human readable descriptive name, maybe with a version, e.g. "SMAST_FVCOM_NECOFS_4.4.6"
(real example)

(whether we should specify A or B (or a combo) is still open)

An identifier is highly recommended.

The open question is whether it should be required or not.

My thoughts -- I'm having a really hard time finding any reason NOT to require it -- from a practical perspective, I see a LOT of files out there that are not quite strictly CF compliant (way too many :- ( ) -- so if someone really has a reason to not supply an identifier, they can not supply an identifier, and their file wont be fully compliant -- that's there choice.

NOTE on (1) above -- I don't think it's a deal breaker, but I'm trying to see how a compliance checker would enforce a "proper" identifier -- it can look like a UUID, but not actually be one, though maybe a warnign is enough -- "this doesn't look like unique identifier to me".

HMM -- I'm thinking of a realistic use case-- not sure how that impacts this issue:

It's quite common for the original provider to create a huge pile of file that all have the grid specification in them -- maybe one for each day, or each timestep, or ....

Then a downstream distributer (or more than one) of the model. results may want to aggregate them in a particular way, and create an external grid definition file. The aggregator has the grid info, and can create an external file, and give it a unique identifier. All good and that aggregator can keep using that same external file as it ads to. the aggregation. All good.

But anyone else that is also aggregating, or is using the original files, will have no way to to know what. identifier anyone else is using -- so they will need to create their own, with their own identifier.

Is this a problem? Now that I've written it out, I don't think so -- and it's probably a good thing -- the only. way to know for sure that your grid matches your data is if they were, in fact, created in the same way from the same source -- so this is all a good thing. And, indeed, ideally, the original source would have provided an identifier for the grid, even if it wasn't in an external file :-)

I'm coming down on the mandatory side here. In short:

If we provide no way to locate an external file, and no way to know for sure that an external file you find is the right one, how in the world can we call that dataset self-describing?

sethmcg · 2024-06-03T17:10:19Z

Ah, that makes sense. I disagree on only one point: I think CF-compliant software shouldn't automatically use external files without matching identifiers, but that it's reasonable for a user to override that manually.

The nature of external variables is that they've been separated out because they apply to multiple files, but that means there are cases where a user won't be able to get the external file and will need to either recreate it or provide something they can assert is functionally identical. And although these cases will be (hopefully) rare, they're not impossible, and as Jonathan says, we don't want to force people to spoof UUIDs when that happens.

davidhassell · 2024-06-04T09:02:32Z

If we provide no way to locate an external file, and no way to know for sure that an external file you find is the right one, how in the world can we call that dataset self-describing?

In a similar way that CF allows minimal_dataset.nc.zip as a CF-compliant dataset:

$ ncdump minimal_dataset.nc
netcdf minimal_dataset {
variables:
	double foo ;
// global attributes:
		:Conventions = "CF-1.11" ;
data:

 foo = 4 ;
}
$

It might not be very useful to some, is not very self describing, and would throw up a load of CF checker warnings (not errors), but it's OK!

CF provides has always provided the means for creators to describe their data, and for readers to interpret them as the creator intended. Should we make setting at least one of standard_name and long_name mandatory so that cases like the one above are not allowed? We could go even further and make other items mandatory (such as units and cell_methods, source, ...), however that's not what CF is today and I don't think we should be going down that road.

External variables have always been delivered with the expectation that the user should be careful to find the correct ones. The identifier is intended to make that choice easier in case that the data writer wants to be extra sure that appropriate variables are used, and in that case surely the writer would not want people to ignore the identifier?

If providing an identifier in the parent file were mandatory, then when the parent and external files are both passed to the CF checker, it would throw an error (not a warning) if the external file did not have the matching identifier. If we actually expect software to provide an "ignore identifier" option in the mandatory case, then we should in fact make the identifier optional so the data creator can say that they're happy for people to find/create their own, thereby preventing the need for software to provide a work-around that the data creator did not intend.

One of the reasons CF is successful is that it gives data writers freedom of expression with a well defined set of tools. We've seen that there are cases where flexibility is desired during the location of external variables, so we shouldn't force data writers to remove that flexibility when they don't want to. It's fine for CF to have an opinion (i.e. to make a recommendation), but I don't think it should remove choice when we know that that choice is sometimes needed.

Cheers,
David

bnlawrence · 2024-06-04T09:04:54Z

I guess my point is that file is self describing. Not fully self describing, but it is. What would one do with a UGRID file of data without its domain description. It's properly useless right? It doesn't pass the minimal requirement of being usable in its own right.

davidhassell · 2024-06-04T09:07:16Z

Hi Bryan - I think "file" is a misleading. We've moved in to "dataset" territory. When external variables are in play, we have one dataset comprising two or three files. The dataset is self-describing, the constituent files aren't so much.

bnlawrence · 2024-06-04T09:09:17Z

Hi Bryan - I think "file" is a misleading. We've moved in to "dataset" territory. When external variables are in play, we have one dataset comprising two or three files. The dataset is self-describing, the constituent files aren't so much.

Quite. So I am suggesting it be mandatory to give people enough information to turn our files into a dataset. I am not suggesting it be mandatory that people use that information, but I think without it, we are putting a lot of requirements on data managers as opposed to users.

davidhassell · 2024-06-04T09:16:09Z

I don't see the conflict. In the optional case, if I was a data manager that liked the idea of the identifiers, than I simply wouldn't accept files without them (just like ESGF does not accept non-CMOR-ized CMIP files). Everyone's happy, right?

JonathanGregory · 2024-06-04T12:36:52Z

Dear all

I agree with David that CF aims to provide conventions which "give data writers freedom of expression with a well defined set of tools." That's related to principle 8 in sect 1.2, "Conventions are provided to allow data-producers to describe the data they wish to produce, rather than attempting to prescribe what data they should produce; consequently most CF conventions are optional." Hence my opinion (I think the same as David's) is still that it should be optional to provide an identifier in the parent file, and optional to require a match when using an external file, although they could be recommended by CF, and they could be required by a project that has its own stricter requirements.

In the case of the CMIP areacell[ao] variables and other external cell measures, we have managed without identifiers because there should be only one choice of these for any given AOGCM. Hence you can find the right external file by its name, in the directory for time-independent quantities. Providing the identifier in both files would be a useful extra check of the match.

CMIP is another case (distinct from my earlier example of manually creating an external file) where I think you don't want an opaque unique identifier. When the files are CMORised, they should all give the same identifier for areacella, for instance i.e. the one in the cell measures file that is supplied to ESG. The identifier will have to be hard-coded in the CMOR instructions for the dataset. Since you have to do this manually, I think it is more sensible and less error-prone to make it a meaningful identifier that the data producers are confident will be used only for this particular grid.

Best wishes

Jonathan

davidhassell · 2024-06-05T09:58:37Z

Hello.

Bryan gave us these user choices:

At reading the data time, there are two choices:
1. We have the domain description file which matches the unique identifier and are good to go, or
2. We have a domain description file which we believe to be correct, but the unique identifier doesn't match, we may have reason to trust that it is appropriate to use this, so we proceed with caution.

I think that these user choices are correct, but there is a third choice of:

We have a domain description file but the unique identifier doesn't match that from the parent file, so we don't want to use it.

I don't think we should force a writer to put a user in that position. The writer can choose to put a user who likes choice 3 in that position - perfectly fine - but there will be cases when they don't want to. A use case has been discussed here: the fact that there may sometimes be multiple identifiers in multiple valid external files, any of which might by acceptable for use. The writer should have the option of giving a user the flexibility of not being constrained by an arbitrary one of those identifiers.

There's no danger that optional identifiers will not get used by data creators who want to use them (who doesn't use at least one of standard_name and long_name, even though it's allowed to provide neither?), but there is a danger of mandatory identifiers having the unintended consequence of preventing the use of a dataset. I think that identifiers are great, that their use should be recommended by CF (so the checker will warn if none are provided in the parent file), but that it is not appropriate for them to be mandatory.

On the question of UUID/meaningful-string, I think that there are good arguments for types, so the identifier should be unrestricted (within the permitted character set).

Cheers,
David

larsbarring · 2024-06-05T10:22:33Z

Dear all,

This issue urgently needs a moderator!

Moreover, I suggest that the moderator at the earliest convenience convenes an offline group to tease out the essentials in the different viewpoints.

/Lars

bnlawrence · 2024-06-05T10:37:40Z

Dear all,

This issue urgently needs a moderator!

Moreover, I suggest that the moderator at the earliest convenience convenes an offline group to tease out the essentials in the different viewpoints.

/Lars

HI Lars. David and I did discuss this offline this morning. Our suggestion is that we'll wait another day, then generate a summary of the (one?) key issue(s) of disagreement (we can agree that easily enough) and then I would suggest we simply vote on which direction we pursue in further fleshing out the proposal.

In advance of that summary, I would say that there are two respectable positions on the table, neither are False in the sense that any of us would argue that others are (logically) Wrong. I believe that the proponents of both sides believe their positions are correct, and are unlikely to be further persuaded by finer detail of the arguments - so a consensus by mutual agreement seems unlikely. I also believe that no one so far believes that their position represents something upon which they MUST WIN, so a vote would be a reasonable way of achieving a way forward.

If that's an acceptable way forward, then we can worry about how to vote next. But come what may, David and I attempt a summary of things in the next couple of days.

bnlawrence · 2024-06-05T10:49:28Z

Meanwhile, I'll add another two more user cases for folks to mull over (ok, the second is a summary, but ...)

There is a model which generates a different domain filling curve for their cell descriptions for every different domain decomposition. The group with that model currently consider that a bug, and will "fix" it, but it's not immediately obvious it's a silly thing to do - if you want to ensure efficient use of the overall UGRID decomposition in a heavily parallel environment (or if you are making heavy use of chunking to the point of chunking your domain, which might vary on application). That means there is no "unique" correct order of cells for a given model and in that situation, they would be writing a domain description to accompany the data of each and every simulation - and of necessity, they will need such a mandatory identifier. Now you might say well, yes, but that's a choice, to which I'd say, yes, but this was a foreseeable situation, but one which one might ignore at first, and then later go "oh bugger". (I am allowed to say "bugger", it is perfectly acceptable English in the Antipodes and not even a little bit offensive.)
We know that folks move files. They rename files. They download files from everywhere. This is why we have strongly resisted breaking information up between files before (with the one obvious exception, which we have already discussed). They will make mistakes. If they have identifiers, such mistakes are recoverable. The cost of requiring identifiers is trivial compared to the potential cost to the consumers. After all, we write all this metadata for consumers, not producers. Why on earth would we not make it mandatory?

JonathanGregory · 2024-06-05T12:42:44Z

If we can't reach a consensus in this GitHub discussion, I think the right procedure would be the one we followed for (what became) units_metadata, which was proposed by David and worked well - namely, to invite anyone who would like to be involved to join an interactive discussion online by zoom or whatever, in order to formulate a recommendation to bring back here. Maybe the group will find a consensus, maybe not, but somehow it should decide what to recommend. At the time that group was formed, it was also made clear that we would expect the community to accept the group's recommendation, whatever it is (except for minor things that don't affect it substantially).

bnlawrence · 2024-06-05T13:30:28Z

That'd certainly be fine by me, I had forgotten that we'd used that mechanism despite Lars actively suggesting it above. Sorry. Do you think it's worth trying to summarise where things are now first? My sense is that we have a lot of information on the table now, and we can synthesise it now as input to such a discussion.

taylor13 · 2024-06-05T14:37:04Z

A summary would certainly be helpful for me. Presumably it could provide some of the text that eventually made it into the "recommendation" resulting from the process, reducing overall effort down the road.

marqh · 2024-06-11T09:54:39Z

hello

regarding:

namely, to invite anyone who would like to be involved to join an interactive discussion online by zoom or whatever, in order to formulate a recommendation to bring back here.
(#357 (comment))

how may one suggest themself to be involved in this activity please?

many thanks

bnlawrence · 2024-06-11T11:42:46Z

namely, to invite anyone who would like to be involved to join an interactive discussion online by zoom or whatever, in order to formulate a recommendation to bring back here.
how may one suggest themself to be involved in this activity please?

I plan to get a summary back here some time early next week, try and take the temperature of the immediate response, and then probably poll for "interested parties" on this ticket, following that, we'll likely doodle for a suitable time. We may be in a hurry if we want to get folks before they disappear for summer.

marqh · 2024-06-11T13:18:26Z

thank you

ChrisBarker-NOAA added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Mar 10, 2022

ChrisBarker-NOAA mentioned this issue Mar 10, 2022

identifying and referencing grids ugrid-conventions/ugrid-conventions#59

Open

paolap mentioned this issue May 28, 2024

Clashing convention metadata ACDguide/Governance#83

Closed

Location / Identification for grid specs outside of a file #357

Location / Identification for grid specs outside of a file #357

Comments

ChrisBarker-NOAA commented Mar 10, 2022

balaji-gfdl commented Mar 10, 2022 via email

ChrisBarker-NOAA commented Mar 10, 2022

bnlawrence commented May 31, 2022

davidhassell commented Jun 17, 2022

JonathanGregory commented Jan 8, 2024

davidhassell commented Jan 9, 2024

ChrisBarker-NOAA commented May 9, 2024

davidhassell commented May 17, 2024

ChrisBarker-NOAA commented May 17, 2024

davidhassell commented May 28, 2024

Allowing UGRID mesh variables to be external

Moderator

Moderator Status Review [last updated: YYYY-MM-DD]

Requirement Summary

Cell measures: A bit of history

Why allow mesh-related variables to be external, but not arbitrary geolocation variables?

Functional summary

Technical Proposal Summary

Allowing meshes to be external

Providing identifiers for external variables

Providing identifiers for external variables

Examples

Updating appendix A: Attributes

Benefits

Status Quo

Associated pull request

JonathanGregory commented May 29, 2024

davidhassell commented May 30, 2024 • edited Loading

davidhassell commented May 30, 2024

ChrisBarker-NOAA commented May 30, 2024

JonathanGregory commented May 31, 2024

JonathanGregory commented May 31, 2024 • edited Loading

davidhassell commented May 31, 2024

bnlawrence commented May 31, 2024

larsbarring commented May 31, 2024

JonathanGregory commented May 31, 2024

taylor13 commented May 31, 2024 • edited Loading

JonathanGregory commented May 31, 2024

sethmcg commented May 31, 2024

bnlawrence commented Jun 1, 2024

bnlawrence commented Jun 1, 2024 • edited Loading

ChrisBarker-NOAA commented Jun 1, 2024

bnlawrence commented Jun 3, 2024

marqh commented Jun 3, 2024

davidhassell commented Jun 3, 2024

bnlawrence commented Jun 3, 2024

bnlawrence commented Jun 3, 2024 • edited Loading

davidhassell commented Jun 3, 2024 • edited Loading

davidhassell commented Jun 3, 2024

JonathanGregory commented Jun 3, 2024

sethmcg commented Jun 3, 2024

davidhassell commented Jun 3, 2024

ChrisBarker-NOAA commented Jun 3, 2024

sethmcg commented Jun 3, 2024

davidhassell commented Jun 4, 2024 • edited Loading

bnlawrence commented Jun 4, 2024

davidhassell commented Jun 4, 2024

bnlawrence commented Jun 4, 2024

davidhassell commented Jun 4, 2024

JonathanGregory commented Jun 4, 2024

davidhassell commented Jun 5, 2024

larsbarring commented Jun 5, 2024

bnlawrence commented Jun 5, 2024

bnlawrence commented Jun 5, 2024

JonathanGregory commented Jun 5, 2024 • edited Loading

bnlawrence commented Jun 5, 2024

taylor13 commented Jun 5, 2024

marqh commented Jun 11, 2024

bnlawrence commented Jun 11, 2024

marqh commented Jun 11, 2024

davidhassell commented May 30, 2024 •

edited

Loading

JonathanGregory commented May 31, 2024 •

edited

Loading

taylor13 commented May 31, 2024 •

edited

Loading

bnlawrence commented Jun 1, 2024 •

edited

Loading

bnlawrence commented Jun 3, 2024 •

edited

Loading

davidhassell commented Jun 3, 2024 •

edited

Loading

davidhassell commented Jun 4, 2024 •

edited

Loading

JonathanGregory commented Jun 5, 2024 •

edited

Loading