Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal to Revise Section 4.4 to Support String-Based Datetime Representations Using ISO 8601 Templates #581

Open
cofinoa opened this issue Dec 22, 2024 · 5 comments
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format

Comments

@cofinoa
Copy link
Contributor

cofinoa commented Dec 22, 2024

Moderator

TBC

Moderator Status Review [last updated: 2024-12-31]

Initial submission. Awaiting community feedback and discussion.

Requirement Summary

This proposal seeks to enhance Section 4.4 of the CF Metadata Conventions by introducing explicit support for string-based representations of time coordinates, inspired by ISO 8601 templates. The update aims to improve clarity, modernize the conventions for interoperability with external systems, and maintain backward compatibility with existing datasets.

Technical Proposal Summary

  • Add string-based representations (YYYY-MM-DD, YYYY-MM-DDTHH:MM:SSZ, YYYYMMDDTHHMMSSZ) as a valid alternative to relative-based representations for time coordinates.
  • Provide examples for both relative-based and string-based time coordinates.
  • Preserve backward compatibility with existing CF-compliant datasets.

Benefits

  • Interoperability: Ensures compatibility with modern data standards like JSON, REST APIs, and ISO-compliant systems.
  • Readability: Enhances dataset usability by allowing self-descriptive, human-readable time strings.
  • Sorting: String-based representations inspired by ISO 8601 tempaltes, are naturally lexicographically sortable, improving data management workflows.
  • Legalization of Previously Invalid Representations: Examples in Section 7.4 of the CF Conventions, which use textual time representations for climatologies, currently describe formats that are not permitted by the standard. This proposal will make such examples valid and consistent with the conventions.
  • Backward Compatibility: Maintains support for existing relative-based time coordinate representations.

Status Quo

Currently, Section 4.4 of the CF Metadata Conventions only supports numeric offsets from a reference datetime (e.g., days since 2000-1-1). While effective, this approach lacks the direct compatibility with string-based datetime representations commonly used in APIs, REST systems, and JSON-based formats.

Past discussions relevant to this proposal, including Trac ticket 14 (2007) and a CF email thread from 2010 featuring Steve Hankin's comments on the topic. These discussions raised concerns about ISO 8601 compatibility, precision, and storage requirements for string-based representations.

Associated pull request

[To be linked once submitted]

Detailed Proposal

1. Overview

This revision introduces support for string-based datetime representations inspired by ISO8601 templates alongside existing relative-based formats. It ensures clarity and flexibility while maintaining backward compatibility with legacy datasets.

2. Proposed Changes

  • Support ISO 8601 templates in units:
    Add the ability to specify ISO 8601 datetime templates directly, with examples such as:

    • units = "CF-DATETIME:YYYY-MM-DD";
    • units = "CF-DATETIME:YYYY-MM-DDTHH:MM:SSZ";
    • units = "CF-DATETIME:YYYYMMDDTHHMMSSZ";
  • Provide Comprehensive Examples:

    • Example: Relative-based Representation

      double time(time) ;  
        time:standard_name = "time" ;  
        time:units = "days since 1990-01-01 00:00:00" ;  
      
        // Example data:  
      data:  
        time = 0.0, 1.0, 1.5, 5.0 ;  
      
    • Example: String-based Representation

      string time(time) ;  
        time:standard_name = "time" ;  
        time:units = "CF-DATETIME:YYYY-MM-DDTHH:MM:SSZ" ;  
      
        // Example data:  
       data:  
         time = "1990-01-01T00:00:00Z", "1990-01-02T00:00:00Z", "1990-01-02T12:00:00Z", "1990-01-06T00:00:00Z" ;  
      

3. Backward Compatibility

The proposal maintains full backward compatibility. Datasets using relative-based representations (days since <datetime>) remain unaffected.

@cofinoa cofinoa added the enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format label Dec 22, 2024
@cofinoa
Copy link
Contributor Author

cofinoa commented Dec 24, 2024

Updated 2. Propossed Changes replacing the optional axis attribute with the optional standard_name attribute

@JonathanGregory
Copy link
Contributor

Dear Antonio @cofinoa

Thanks for making this careful proposal, to add support in CF for ISO8601 datetime string-valued coordinates as an alternative to numeric coordinates. Although I recognise there would be benefits of convenience in some circumstances from making this change, it wouldn't enable anything which is currently not possible with CF. Therefore I think the costs and difficulties are greater than the advantages, and I don't support the change.

Because it's an attractive idea, it has been proposed before, and decided against, at least twice during the development of CF. It was proposed and debated in 2007 in trac ticket 14. It was suggested and discussed again in a thread that concluded with the subject "time as ISO strings" on the CF email list in 2010 e.g. Steve Hankin's email.

I don't think we should make this change "because all previous versions must generally continue to be supported in software for the sake of archived datasets, and in order to limit the complexity of the conventions. For these reasons, there is a strong preference against introducing any new capability to the conventions when there is already some method that can adequately serve the same purpose (even if a different method would arguably be better than the existing one)" (principle 10 in Sect 1.2).

In the previous discussions, several specific points have been made, including:

  • ISO datetime strings are not defined for non-real-world calendars.

  • ISO datetime strings raise issues of precision, because omitted elements imply imprecise datetimes, which is a concept CF doesn't have at present.

  • You can't work out time-intervals from datetime strings without converting them into numbers.

  • Datetime strings take up more storage than double-precision numbers.

  • Although numeric time coordinate values themselves aren't human-readable datetimes, software is available to convert them into strings—probably more software than when we last discussed this issue. In particular, ncdump -t converts CF-netCDF numeric time coordinates to human-readable strings (for all CF calendars except for the leap-second-aware ones we've added in CF1.12).

Best wishes

Jonathan

@cofinoa
Copy link
Contributor Author

cofinoa commented Dec 31, 2024

Dear @JonathanGregory,

Thank you for your thoughtful and detailed reply. I appreciate you taking the time to revisit this proposal and providing the historical context and references to past discussions, as well as highlighting the implications of Principle 10 in Section 1.2 of the CF Conventions.

One of my key concerns with the current numeric representation is that it transforms an absolute datetime (e.g., 1990-01-01T00:00:00Z) into a relative period measured from a reference epoch. While effective for many use cases, this approach introduces challenges when integrating CF-compliant data with modern systems, APIs, and formats that rely on absolute datetime strings for interoperability. This transformation adds a layer of abstraction that can lead to potential pitfalls, particularly when human readability and self-descriptive formats are priorities.

Additionally, the numeric representation assumes regular time intervals, which works well for most calendars but introduces complications with UTC, as described in Section 4.4.3 of the CF Conventions. UTC includes leap seconds, adjustments made to account for irregularities. These adjustments mean that intervals between times are not always uniform, and a single numeric time value cannot directly represent a leap second (e.g., 23:59:60) without additional metadata. This makes conversions between numeric representations and calendar datetimes more complex and potentially error-prone, particularly for datasets requiring precise temporal alignment.

I also understand that using ISO 8601 for string-based time representation may implicitly tie it to the Gregorian calendar, potentially introducing confusion when applied to non-real-world calendars like noleap or 360_day. My intention is not to enforce ISO 8601’s calendar semantics but to leverage its string format templates to represent date and time values in these calendars. If this approach creates confusion or misalignment with the standard, I would be open to exploring alternative naming for the representation, rather than strictly calling it "ISO 8601" (i.e. CF-DATETIME).

I understand your concerns regarding Principle 10, which emphasizes the importance of limiting new capabilities when existing methods already address the same needs. While numeric representations can adequately serve many use cases, I believe that introducing string-based formats as an optional alternative addresses significant gaps in modern interoperability without disrupting existing practices.

This proposal helps to improve CF while aligning with several principles of the CF Conventions:

  • Principle 1: Self-Description
    String-based formats naturally support self-description by encoding complete datetime information directly within the time coordinate, eliminating the need for external tools or additional metadata for interpretation.

  • Principle 2: Addressing Clear Needs
    This proposal addresses a clear need for modern interoperability while ensuring that it remains optional and minimally disruptive to existing practices. By leveraging widely used string-based formats (like ISO 8601), it provides a practical solution for emerging use cases.

  • Principle 4: Practicality
    By considering string-based formats, the proposal ensures practicality for both data producers and users. Producers benefit from a widely accepted standard, while users gain access to intuitive, human-readable time representations.

  • Principle 5: Human and Machine Readability
    String-formats, based in ISO 8601 templates, are both human-readable and easily parsable by programs, enhancing CF’s usability in modern workflows.

  • Principle 7: Minimizing Mistakes
    Leap second handling within the UTC calendar exemplifies a scenario where numeric representations may lead to errors. String-based representations such as ISO 8601 explicitly support leap second semantics, ensuring precise and unambiguous temporal metadata and reducing the possibility of mistakes.

  • Principle 8: Flexibility and Optionality
    This proposal aligns with Principle 8 by providing string-based formats as an optional alternative to numeric representations. It empowers data producers to describe their data in ways that best suit their workflows and interoperability requirements, while maintaining support for existing practices.

With this in mind, I will take your feedback to refine my proposal further. I plan to review the past discussions and decisions you mentioned, including trac ticket 14 (2007) and the CF email thread from 2010, to better understand the concerns raised at the time and how they were addressed.

Thank you again for your thoughtful response. I look forward to continuing this discussion and refining the proposal to best serve CF’s goals and community needs.

@cofinoa cofinoa changed the title Proposal to Revise Section 4.4 to Support ISO 8601 in Time Coordinate Representation Proposal to Revise Section 4.4 to Support String-Based Datetime Representations Using ISO 8601 Templates Dec 31, 2024
@cofinoa
Copy link
Contributor Author

cofinoa commented Dec 31, 2024

Comment on Updates made on 2024-12-31

Based on Jonathan's comment, the issue description and title have been updated to clarify the proposal's intent and scope. The title and description now emphasizes the support for string-based datetime representations using ISO 8601 templates. These changes aim to enhance clarity and precision for community discussion.

@JonathanGregory
Copy link
Contributor

JonathanGregory commented Dec 31, 2024

Dear Antonio

Thanks for considering my objections. Here are some further comments.

  • I acknowledge that for some purposes string representations of datetimes are necessary or convenient. But that, in my view, is not sufficient reason for adding an alternative representation of time to CF. Doing so would require all software that aims to comply with CF to implement a new set of functions, as well as the existing facilities. Until this was completed, data which used the new string convention would not be interpretable by all software i.e. the new convention would break existing programs. Thus it would harm interoperability. "All software" doesn't just mean professionally maintained packages, but includes ad-hoc software written by data-users to process or analyse data. Given the huge importance of time coordinates, I don't think we should consider causing such disruption unless there is a really essential use-case which cannot be dealt with by the existing conventions.

  • Since the time coordinate variable includes the units and the calendar attributes, I think the numeric convention for time coordinates is nearly self-describing, in the sense that you don't need to refer to external resources. I agree that you have to know what the calendars mean, and in the case of utc you have to know when the leap-seconds occurred. The former is fairly common knowledge or self-explanatory, but the latter is something most people have to look up. Although, as you say, this knowledge is not needed for writing and reading formatted datetime strings, you can't escape the need for it eventually, which instead occurs when you want to work out the interval of time between two instants. With numeric time coordinates we have the difficulty that an instrument may not know about leap seconds and hence might produce the wrong number to represent the UTC of an observation. In those circumstances the same ignorant instrument will produce an incorrect datetime string instead.

  • You're right of course that times expressed as strings are easily human-readable whereas they are not as floating-point numbers. But netCDF is a binary format, which a human can't read in any case. As a human, to read a netCDF file, you have to use a program of some sort. You can convert it to ASCII using ncdump, which can translate CF times into udunits-like strings, or you can use software libraries in Python and other languages which know how to convert numeric time coordinates to and from strings in various formats.

Similar things have been said in the previous discussions.

Let's see what others think. Happy 2025-01-01 00:00:00.

Jonathan

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Proposals to add new capabilities, improve existing ones in the conventions, improve style or format
Projects
None yet
Development

No branches or pull requests

2 participants