-
Notifications
You must be signed in to change notification settings - Fork 47
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Allow a standard name alias
to have more than one entry_id
#132
Comments
Dear Martin This looks sensible to me. Alison's comment would be useful on this one too. Changing the standard names rectifies a defect, I agree, but I think that changing the schema should be treated as a proposal for substantial change to the convention. Best wishes Jonathan |
Dear Martin, Sorry for not getting to this ticket sooner! I'm not sure I agree with changing the ids with "spurious spaces". The problem is that when the names were first published they did accidentally contain spaces - the aliases were introduced to correct the mistake (in the same way as we would do for a simple spelling mistake). The versions of the names containing spaces had been around for quite a long time before they were noticed. "rate_of_ hydroxyl_radical_destruction_due_to_reaction_with_nmvoc" appeared in versions 28 - 36 of the standard name table, spanning a period of 18 months in 2015-16. The other four appeared in versions 8 - 10 spanning a 7 month period in 2008. It is possible that during those periods data files were written containing the erroneous names. To avoid invalidating such files I thought it was better to use aliases rather than just quietly delete the problem! I could of course simply delete the aliases if that is generally felt to be acceptable, but that would mean treating typos involving spaces differently from any other minor error that might crop up in standard names. Regarding the other alias that points to two current names, this again was done to avoid possibly invalidating existing data files. The original name, surface_carbon_dioxide_mole_flux, contained no indication of sign convention and this was felt not to be satisfactory. That particular name dates back to pre-version 1 of the standard name table and the aliases weren't introduced until version 15, a period of at least 2006 - 2010. Data files could have been written during that period using either upwards positive or downwards positive as a sign convention and both would have been valid CF at the time. I support the idea of changing the schema to make this use of aliases valid - such a use case was probably not envisaged when the schema was created but the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid. Best wishes, Alison |
I agree with "the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid", but I do not agree that the original meaning of the data has been preserved by aliasing it to two identifiers. Anyone who used the original identifier undoubtedly had one of those two identifiers in mind, but we have not clarified the intended meaning through this process. I'm sorry I missed this topic first time around, and it isn't worth getting up in arms about, but the original term has a clearly different meaning and application than either of its referenced replacements. |
alias
to have more than one entry_id
Dear all Two points were made in this issue at the outset, but I believe that only one remains, so I have changed the title accordingly. The proposal is to change the standard name schema to permit an Jonathan |
I fully support the change proposed by @mattben when opening this issue. And if I understand Matthew's comment as relaying a response, also @japamment supports this (cf. last few line in that comment.) |
I support this change as well. |
I also support the change that permits an Are there any implications for known software that uses the schema (such as the standard names editor, I presume) easy to deal with? |
The change required to the xml schema (xsd file) is really small: <xs:element name="alias">
<xs:annotation>
<xs:documentation>The alias element contains one or more entry_id element
with the id of the entry containing the definition. It is intended as
a mechanism for modifying standard names in a backward compatible
fashion. Typically, there is one entry_id, but in a few instances
there are two entry_id, for example if a standard name is divided
into upwards and downwards alternatives.</xs:documentation>
</xs:annotation>
<xs:complexType>
<xs:sequence>
<xs:element ref="entry_id" maxOccurs="unbounded"/>
</xs:sequence>
<xs:attribute name="id" type="xs:ID" use="required"/>
</xs:complexType>
</xs:element> (added annotation linebreaks). The only change needed is addition of the However changes are needed to other parts of the processing chain, see this comment. |
Hi @larsbarring, I support this change as there is a clear use case for allowing one alias to map to two standard names, as demonstrated in the original proposal. If I have understood correctly, cf-convention/cf-conventions/issues/509 and the associated pull request, cf-convention/cf-conventions/pull/510 will update Appendix B to be consistent with this issue. I support those too. Regarding the xml file, a modification to the CEDA standard names editor will be needed to allow it to output pairs of entry_id tags associated with a single alias_id. As a temporary measure until the editor is updated, we can apply a post-processing script to the xml file to achieve the same result. I will prepare a suitable script ahead of the next standard name table update, so once /issues/509 is closed I think this issue can also be closed. |
The issue with double aliases have been resolved in #509. Standard names with a spurious space have been discussed in https://github.com/orgs/cf-convention/discussions/310 with unanimous outcome, and will be resolved in cf-convention/vocabularies#7. Hence I am closing this as "change agreed" (even though the changes are actually implemented elsewhere). |
[This issue was originally entitled "TRAC cf-convention/discuss#155: Invalid "id" values in CF Standard Name aliasses"]
Running an XML schema check on the CF standard name list, I found the following minor (because they relate to aliasses, not the standard name definitions) issues:
There are spurios spaces in these ids:
https://cfconventions.org/Data/Trac-tickets/155.html
The standard name surface_carbon_dioxide_mole_flux has two aliasses, surface_upward_mole_flux_of_carbon_dioxide and surface_downward_mole_flux_of_carbon_dioxide, which is intended (the definitions of the two newer names indicate that the deprecated name was too imprecise). The problem here is that the XSD schema does not allow for two aliasses with the same id. Having unique id values for each element is useful, so I suggest we change the schema and the document to replace
with
EDIT 2024-01-19: Changed the top link to correctly point to the Trac ticket /@larsbarring
The text was updated successfully, but these errors were encountered: