Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow a standard name alias to have more than one entry_id #132

Closed
mattben opened this issue Jun 19, 2018 · 10 comments
Closed

Allow a standard name alias to have more than one entry_id #132

mattben opened this issue Jun 19, 2018 · 10 comments
Labels
change agreed Issue accepted for inclusion in the next version and closed enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@mattben
Copy link
Contributor

mattben commented Jun 19, 2018

[This issue was originally entitled "TRAC cf-convention/discuss#155: Invalid "id" values in CF Standard Name aliasses"]

Running an XML schema check on the CF standard name list, I found the following minor (because they relate to aliasses, not the standard name definitions) issues:

There are spurios spaces in these ids:

  • rate_of_ hydroxyl_radical_destruction_due_to_reaction_with_nmvoc
  • mole_fraction_of_hypochlorous acid_in_air
  • mole_fraction_of_dichlorine peroxide_in_air
  • mole_fraction_of_chlorine monoxide_in_air
  • mole_fraction_of_chlorine dioxide_in_air
    https://cfconventions.org/Data/Trac-tickets/155.html
    The standard name surface_carbon_dioxide_mole_flux has two aliasses, surface_upward_mole_flux_of_carbon_dioxide and surface_downward_mole_flux_of_carbon_dioxide, which is intended (the definitions of the two newer names indicate that the deprecated name was too imprecise). The problem here is that the XSD schema does not allow for two aliasses with the same id. Having unique id values for each element is useful, so I suggest we change the schema and the document to replace
  <alias id="surface_carbon_dioxide_mole_flux">
    <entry_id>surface_upward_mole_flux_of_carbon_dioxide</entry_id>
  </alias>

  <alias id="surface_carbon_dioxide_mole_flux">
    <entry_id>surface_downward_mole_flux_of_carbon_dioxide</entry_id>
  </alias>

with

<alias id="surface_carbon_dioxide_mole_flux">
    <entry_id>surface_upward_mole_flux_of_carbon_dioxide</entry_id>
    <entry_id>surface_downward_mole_flux_of_carbon_dioxide</entry_id>
</alias>

EDIT 2024-01-19: Changed the top link to correctly point to the Trac ticket /@larsbarring

@mattben
Copy link
Contributor Author

mattben commented Jun 19, 2018

Dear Martin

This looks sensible to me. Alison's comment would be useful on this one too. Changing the standard names rectifies a defect, I agree, but I think that changing the schema should be treated as a proposal for substantial change to the convention.

Best wishes

Jonathan

@mattben
Copy link
Contributor Author

mattben commented Jun 19, 2018

Dear Martin,

Sorry for not getting to this ticket sooner!

I'm not sure I agree with changing the ids with "spurious spaces". The problem is that when the names were first published they did accidentally contain spaces - the aliases were introduced to correct the mistake (in the same way as we would do for a simple spelling mistake). The versions of the names containing spaces had been around for quite a long time before they were noticed. "rate_of_ hydroxyl_radical_destruction_due_to_reaction_with_nmvoc" appeared in versions 28 - 36 of the standard name table, spanning a period of 18 months in 2015-16. The other four appeared in versions 8 - 10 spanning a 7 month period in 2008. It is possible that during those periods data files were written containing the erroneous names. To avoid invalidating such files I thought it was better to use aliases rather than just quietly delete the problem! I could of course simply delete the aliases if that is generally felt to be acceptable, but that would mean treating typos involving spaces differently from any other minor error that might crop up in standard names.

Regarding the other alias that points to two current names, this again was done to avoid possibly invalidating existing data files. The original name, surface_carbon_dioxide_mole_flux, contained no indication of sign convention and this was felt not to be satisfactory. That particular name dates back to pre-version 1 of the standard name table and the aliases weren't introduced until version 15, a period of at least 2006 - 2010. Data files could have been written during that period using either upwards positive or downwards positive as a sign convention and both would have been valid CF at the time. I support the idea of changing the schema to make this use of aliases valid - such a use case was probably not envisaged when the schema was created but the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid.

Best wishes,

Alison

@graybeal
Copy link

I agree with "the main aim should always be to preserve the original meaning of the data, not to accidentally change it by imposing a schema that is too rigid", but I do not agree that the original meaning of the data has been preserved by aliasing it to two identifiers.

Anyone who used the original identifier undoubtedly had one of those two identifiers in mind, but we have not clarified the intended meaning through this process. I'm sorry I missed this topic first time around, and it isn't worth getting up in arms about, but the original term has a clearly different meaning and application than either of its referenced replacements.

@mattben mattben changed the title #155 Invalid "id" values in CF Standard Name aliasses TRAC #155: Invalid "id" values in CF Standard Name aliasses Jun 19, 2018
@JonathanGregory JonathanGregory changed the title TRAC #155: Invalid "id" values in CF Standard Name aliasses Allow a standard name alias to have more than one entry_id Jan 4, 2022
@JonathanGregory
Copy link
Contributor

Dear all

Two points were made in this issue at the outset, but I believe that only one remains, so I have changed the title accordingly. The proposal is to change the standard name schema to permit an alias to have more than one entry_id, given that there is one use-case for this. Have their been any subsequent discussions elsewhere about this? Where is the standard name schema CFStandardNameTable.xsd kept?

Jonathan

@larsbarring
Copy link
Contributor

I fully support the change proposed by @mattben when opening this issue. And if I understand Matthew's comment as relaying a response, also @japamment supports this (cf. last few line in that comment.)

@JonathanGregory
Copy link
Contributor

I support this change as well.

@davidhassell
Copy link
Contributor

I also support the change that permits an alias to have more than one entry_id.

Are there any implications for known software that uses the schema (such as the standard names editor, I presume) easy to deal with?

@larsbarring
Copy link
Contributor

larsbarring commented Jan 15, 2024

The change required to the xml schema (xsd file) is really small:

	<xs:element name="alias">
		<xs:annotation>
			<xs:documentation>The alias element contains one or more entry_id element 
                               with the id of the entry containing the definition. It is intended as 
                               a mechanism for modifying standard names in a backward compatible 
                               fashion. Typically, there is one entry_id, but in a few instances 
                               there are two entry_id, for example if a standard name is divided 
                               into upwards and downwards alternatives.</xs:documentation>
		</xs:annotation>
		<xs:complexType>
			<xs:sequence>
				<xs:element ref="entry_id" maxOccurs="unbounded"/>
			</xs:sequence>
			<xs:attribute name="id" type="xs:ID" use="required"/>
		</xs:complexType>
	</xs:element>

(added annotation linebreaks). The only change needed is addition of the maxOccurs="unbounded" attribute, and I have amended the annotation text.

However changes are needed to other parts of the processing chain, see this comment.

@larsbarring larsbarring added defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors and removed defect Conventions text meaning not as intended, misleading, unclear, has typos, format or language errors labels Jan 18, 2024
@japamment
Copy link
Member

Hi @larsbarring, I support this change as there is a clear use case for allowing one alias to map to two standard names, as demonstrated in the original proposal.

If I have understood correctly, cf-convention/cf-conventions/issues/509 and the associated pull request, cf-convention/cf-conventions/pull/510 will update Appendix B to be consistent with this issue. I support those too.

Regarding the xml file, a modification to the CEDA standard names editor will be needed to allow it to output pairs of entry_id tags associated with a single alias_id. As a temporary measure until the editor is updated, we can apply a post-processing script to the xml file to achieve the same result. I will prepare a suitable script ahead of the next standard name table update, so once /issues/509 is closed I think this issue can also be closed.

@larsbarring larsbarring added the change agreed Issue accepted for inclusion in the next version and closed label May 9, 2024
@larsbarring
Copy link
Contributor

The issue with double aliases have been resolved in #509. Standard names with a spurious space have been discussed in https://github.com/orgs/cf-convention/discussions/310 with unanimous outcome, and will be resolved in cf-convention/vocabularies#7. Hence I am closing this as "change agreed" (even though the changes are actually implemented elsewhere).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
change agreed Issue accepted for inclusion in the next version and closed enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

6 participants