Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Centralize author metadata for consistency across document and zenodo.json #579

Open
cofinoa opened this issue Dec 17, 2024 · 8 comments
Open
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@cofinoa
Copy link
Contributor

cofinoa commented Dec 17, 2024

Moderator

TBC

Moderator Status Review [last updated: YYYY-MM-DD]

Brief comment on current status, update periodically

Requirement Summary

The current workflow requires maintaining author information in multiple places, notably:

  • Document Header: The list of authors is included in the document header line
  • About Authors Section: Authors are listed again in a dedicated section in the document.
  • zenodo.json: The authors are also listed in Zenodo metadata to facilitate citation and DOI registration.
  • CITATION.cff: The citation file used by Github

This duplication of information increases the risk of inconsistencies, requires manual updates in multiple places, and increases the maintenance burden on contributors and maintainers. A single source of truth is required to ensure that author information is consistent and automatically propagated to all required locations.

Technical Proposal Summary

To resolve this, the proposal is to centralize author metadata into a single source of truth (authors.adoc), from which all other dependent files are automatically generated.

Key changes:

  1. Single Source of Truth: Store all author-related metadata in a centralized file (authors.adoc).
  2. Automation Script: A script (update_authors.py) will parse authors.adoc and generate the following files:
    • zenodo.json: Updates the creators JSON section of the file for Zenodo metadata.
    • CITATION.cff: Updates the citation CFF YAML authors section for citing purposes in Github
    • `about-authors.adoc: A separate file that lists original and additional authors in a clean, structured format.
    • Document Header: The authors line in the document header will be replaced with multiple Asciidoctor attributes (e.g., :author_1:, :author_2:, etc.).

This approach ensures that the author list is always consistent across all files and eliminates manual errors.

Benefits

  1. Consistency: Author names, affiliations, and roles are consistent across Zenodo, documentation, and AsciiDoc headers.
  2. Simplification: Contributors only need to update a single file (authors.adoc), and all other files are automatically updated.
  3. Error Reduction: By generating the necessary files from a single source, human error is eliminated.
  4. Reduced Workload: Contributors do not have to edit multiple files, which simplifies contributions.
  5. Automation: Automatic generation of zenodo.json, about-authors.adoc, and header author attributes reduces manual intervention.
  6. Enhanced Flexibility: New authors can be added in one place, and all dependent files are updated.

Status Quo

Currently, multiple author entries exist in the following places:

  1. Document Header: Author names are embedded in the document header as a comma-separated list in the :authors: attribute.
  2. About Authors Section: Authors are listed again in a separate section under "About the Authors" in the document body.
  3. zenodo.json: The authors are listed in the creators JSON section of this file for Zenodo DOI registration.
  4. CITATION.cff: The authors are listed in the authors YAML section of this file for citing purposes in Github.

Maintaining and updating authors in all three places is tedious, error-prone, and requires multiple manual edits. This proposal aims to eliminate this redundancy.

Associated pull request

#580

Detailed Proposal

  1. Single Source of Truth: authors.adoc
    I propose to use authors.adoc as the single source of truth for all author metadata. The file will contain metadata for each
    author using Asciidoctor attributes. The attributes will include:
    • Author Name (split into First Name, Middle Name (optional), Last Name)
    • Affiliation (required)
    • ORCID (optional)
    • Type (original or additional) — Used to classify authors as "original" or "additional" contributors.

authors.adoc Format
The format will follow AsciiDoc attributes like this:

:author_1: Doe{nbsp}John{nbsp}A.
:author_1_affiliation: Example University
:author_1_orcid: 0000-0002-1234-5678
:author_1_type: original

:author_2: Smith{nbsp}Jane
:author_2_affiliation: Research Institute
:author_2_type: additional

Key Points:

  • The {nbsp} is used to separate LastName, FirstName, and optional MiddleName.
  • The type determines if the author is original or additional which is being used for the About Authors section
  • ORCID is optional, but if provided, it will be added to zenodo.json.
  • Author IDs (:author_1:, :author_2:, etc.) are automatically detected and processed.
  1. Automation Script (update_authors.py)
    A script, update_authors.py, will automate the update of zenodo.json and CITATION.cff files, and rewrite about-authors.adoc.

How it Works

  1. Parses authors.adoc: Extracts author metadata (name, affiliation, ORCID, and type).
  2. Updates zenodo.json: Populates the "creators" section with each author's name, affiliation, and ORCID.
  3. Updates CITATION.cff: Populates the "authors" YAML section with each author's given name, family name, affiliation, and ORCID (if applicable).
  4. Writes about-authors.adoc file: Writes splitting original and additional authors into a clean AsciiDoc format, which is being included by the main document
@cofinoa cofinoa added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Dec 17, 2024
@sethmcg
Copy link
Contributor

sethmcg commented Dec 17, 2024

This sounds like a good idea to me.

@JonathanGregory JonathanGregory added GitHub Improvement to how we use GitHub for this repository and removed GitHub Improvement to how we use GitHub for this repository labels Dec 17, 2024
@JonathanGregory
Copy link
Contributor

I agree, that's very useful. Thanks, Antonio @cofinoa. Could you include the CITATION.cff file as well in this automatic process?

@cofinoa
Copy link
Contributor Author

cofinoa commented Dec 18, 2024

@JonathanGregory,

I have added CITATION.cff to the update_authors.py script.

@cofinoa
Copy link
Contributor Author

cofinoa commented Dec 18, 2024

There are inconsistencies in the author affiliations between the CF Conventions document and the Zenodo metadata (also reflected in CITATION.cff) for the following authors:

  • @larsbarring

    • CF Document: SMHI Rossby Centre, Swedish Meteorological and Hydrological Institute
    • Zenodo / CITATION.cff: SMHI
  • @czender

    • CF Document: University of California, Irvine
    • Zenodo / CITATION.cff: UC Irvine

These differences may create confusion since Zenodo metadata and CITATION.cff are often used for citations.

Proposed Solution

  1. Unify affiliation text: Use the same text for both CF Conventions and Zenodo metadata.
  2. Allow separate texts: I can adjust the automation scripts to support different affiliation texts for CF Conventions and Zenodo, but this adds complexity.

Since Zenodo metadata for previous releases can still be updated, but CF Conventions documents cannot, it would be simpler to unify the text for future releases.

Would you (@larsbarring, @czender) agree to unify your affiliation text? If so, which version would you prefer (CF Conventions or Zenodo)?

Looking forward to your feedback.

@JonathanGregory
Copy link
Contributor

I have added CITATION.cff to the update_authors.py script.

Thank you, Antonio.

@czender
Copy link
Contributor

czender commented Dec 18, 2024

I would prefer the longer version used in the CF Conventions. Thanks @cofinoa.

@czender
Copy link
Contributor

czender commented Dec 18, 2024

p.s. The longer version differentiate University of California from University of Colorado and from University of Cantabria. This will be helpful to those who do not know where (the hell) Irvine is.

@larsbarring
Copy link
Contributor

And I would prefer the shorter version "SMHI". Thanks for pointing this out Antonio @cofinoa

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

5 participants