Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Property definitions #376

Merged
merged 37 commits into from
Jun 7, 2022
Merged

Conversation

rartino
Copy link
Contributor

@rartino rartino commented Jul 9, 2021

This PR is spawned from our lengthy room 3: Specification extensions discussions at the OPTIMADE 2021 workshop. Many participants in these discussions have impacted the design; I specifically recall being in discussions with @sauliusg, @dwinston, @fekad, @ltalirz, @ml-evs, @CasperWA, @dsdavide, @markus1978, @blokhin, @shyamd, @gjc29, and Yukari Katsura (but the list is likely not complete.)

  • The PR adds Appendix 8.3 that defines a precise standardized computer-readable dictionary-based format for property definitions to express OPTIMADE properties to the same level of detail as we presently do in human-readable form with our standard properties in Section 7: Entry list.
  • The Property Description, when expressed in JSON, is a subset of JSON Schema, meaning that it should be possible to insert these property definitions as-is into a schema for the full response and validate responses against it.
  • A subsection to the Appendix carefully deals with units. I expect some pushback regarding the decision to follow the example of OpenKIM and use the GNU units database of units as the primary 'default' standard + a homebrew scheme for compound units that attempts some level of normalization. However, after looking at the alternatives, I find the GNU units database far more complete in terms of units relevant to our use case, and thus far closer to the goal of satisfying users with the mindset "I don't care about units, just let me say that the value is expressed in angstroms" (or hartree, or rydberg, or nanoacre, or donkeypower,...).
  • The section Entry Listing Info Endpoints (5.5.2) is now updated to say that properties should use this format, and the example is updated to match.

I haven't carefully considered backwards compatibility yet. Lets iterate on the design until we are happy, and at that point decide if it is possible to introduce these changes in a non backwards-breaking way or not.

This PR is part of a sequence of improvements aimed at standardizing property and calculation data in OPTIMADE. At present, my vision for this development is something like this:


  1. This PR creates a standard format for Property Definitions that can be used to define properties in your implementations info endpoint for the corresponding entry types.
  2. Once merged, we express everything in Section 7 using this format in a separate file of the specification repo (and probably remove Section 7).
    • We create a /measurables endpoint for physical properties (providing values inferred from calculations and/or experiments) according to a design discussed in room 3 and use the Property Definition format to standardize a number of physical properties, making them queryable.
    • We extend /calculations according to a design discussed in room 3 and use the Property Definition format to standardize a number of output_<something> outcomes of calculations, making them queryable.

At this point it is possible via a couple of queries to answer things like:

  • "give me the output_ks_band_gap for all calculations done with VASP and the HSE functional for all structures with titanium".
  • "give me your best possible estimate of the (real) value of the physical optical absorption band gap for this specific TiO2 structure."

Note: should not be merged before #348, since it uses TRUE and FALSE.

@fekad
Copy link
Contributor

fekad commented Jul 19, 2021

@rartino Thank you very much for putting this PR together. Overall it looks very good but I think we should put more stress on property definition rather than putting it into the appendix. The data without "some" definition is meaningless and I think one of the main goals is that this way everybody can add definition to any "database-specific" data. I think this is almost as important as the definition of the API itself... (API: how? property definition: what?)

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
Copy link
Member

@ml-evs ml-evs left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for this great write-up @rartino, this looks to combine other outstanding issues/PRs around boolean values (#348, #345), more explicit feature sets/support levels (#91) and types (#269).

I have left some specific comments below.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
Copy link
Contributor

@fekad fekad left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(Note: The suggestion has been updated to avoid any confusion!)

It was not possible to add the suggestions one-by-one so you can find here everything together. The text is 99% the same but it has been slightly restructured. The main goal is to move as much description from the appendix into the main document to represent its importance.

Changes in short:

  • separate property definition (x-optimade-property ) and query information (x-optimade-support)
  • keeping only the very specific descriptions in the appendix
  • moving unit definition into x-optimade-property
  • get rid of non-Inner vs. Inner property separation
  • removing version requirement
  • the definition of unit and property fields follows the JSON Schema design philosophy
Entry Listing Info Endpoints
~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Entry listing info endpoints are accessed under the versioned or unversioned base URL serving the API as :endpoint:`/info/<entry_type>` (e.g., http://example.com/optimade/v1/info/structures or http://example.com/optimade/info/structures).
The response for these endpoints MUST include the following information in the :field:`data` field:


- **description**: Description of the entry.
- **properties**: A dictionary describing properties for this entry type, where each key is a property name and the value is an OPTIMADE Property Definition described in detail below `Property Definitions`_.
- **formats**: List of output formats available for this type of entry.
- **output\_fields\_by\_format**: Dictionary of available output fields for this entry type, where the keys are the values of the :field:`formats` list and the values are the keys of the :field:`properties` dictionary.


Example (note: the description string has been wrapped for readability only):

.. code:: jsonc

    {
      "data": {
        "description": "a structures entry",
        "properties": {
          "nelements": {
            "type": ["integer", "null"],
            "title": "Number of elements",
            "description": "Number of different elements in the structure as an integer. \n
            \n
              -  Note: queries on this property can equivalently be formulated using `elements LENGTH`.\n
              -  A filter that matches structures that have exactly 4 elements: `nelements=4`.\n
              -  A filter that matches structures that have between 2 and 7 elements: `nelements>=2 AND nelements<=7`.",
            "examples": [
              3
            ],
            "x-optimade-support": {
              "sortable": true,
              "support": "should",
              "queryable": "full"
            }
          },
          "lattice_vectors": {
            "title": "Unit cell lattice vectors",
            "description": "The three lattice vectors in Cartesian coordinates, in ångström (Å).\n
            ...",
            "x-optimade-property": {
              "unit": {
                symbol: "angstrom"
              }
            },
            "x-optimade-support": {
              "sortable": false,
              "support": "should",
              "queryable": "full"
            },
            "type": "array",

            // ... <further fields defining lattice_vectors>
          }
          // ... <other property descriptions>
        },
        "formats": ["json", "xml"],
        "output_fields_by_format": {
          "json": [
            "nelements",
            "lattice_vectors",
            // ...
          ],
          "xml": ["nelements"]
        }
      }
      // ...
    }


Property Definitions
********************

The property definition uses a dictionary-based construct that strictly follows a subset of the `JSON Schema Validation Draft 2020-12 <https://json-schema.org/draft/2020-12/json-schema-validation.html>`__ and `JSON Schema Core Draft 2020-12 <https://json-schema.org/draft/2020-12/json-schema-core.html>`__ standards.
The motivation for this design decision is that it makes the JSON representation of a Property Definition a fully valid standard JSON Schema meanwhile these restrictions are intended to reduce the complexity of possible data types that implementations have to handle in different formats and database backends.
While these standards are JSON-based, the dictionary construct can be realized in any format capable of representing the basic OPTIMADE data types. 
Hence, the field definitions below are given in terms of OPTIMADE data types, and the implementation maps these to the corresponding data types for the output format in which the property definition is expressed, as described in detail in the appendix `Property Definitions`_.
Nevertheless, for consistency across formats, the JSON type names MUST be used regardless of the standard type names of the output format. 

An OPTIMADE *Property Definition* defines a specific property, which will be referred to as *the defined property*.


A Property Definition is a dictionary that has the following format:

**REQUIRED keys:**

- :field:`type`: String or List.
  The string or list specifies the type of the defined property. The list where the first item MUST be one of the strings below, and the second item MUST be the string :val:`"null"`.
  The strings are JSON type names encoded as strings, but they technically refer to the corresponding OPTIMADE data types as described in detail in the appendix `Property Definitions`_.

  For specific types addition keys are REQUIRED:

  - REQUIRED keys for Property Definitions where the type field includes "object":

    - :field:`properties`: Dictionary.
      Each key in the given dictionary provides a nested inner Property Definition.
      The value of each corresponding key in the defined property MUST adhere to this definition.
      The defined property MUST only contain keys present in this dictionary.

  - REQUIRED keys for Property Definitions where the type field includes "array":

    - :field:`items`: List.
      A nested inner Property Definition.
      The defined property is an array where all items MUST match the given Property Definition.

  The type specific OPTIONAL keys are described in detail in the appendix `Property Definitions`_.

**OPTIONAL keys:**

- :field:`title`: String.
  A short single-line human-readable name for the defined property appropriate to show as part of a user interface.

- :field:`description`: String.
  A human-readable multi-line description that explains the purpose, requirements, and conventions of the defined property.
  The format SHOULD be a one-line description, followed by a new paragraph (two newlines), followed by a more detailed description of all the requirements and conventions of the defined property.
  Formatting in the text SHOULD use Markdown in the `CommonMark v0.3 format <https://spec.commonmark.org/0.30/>`__.

- :field:`examples`: List.
  A list of example values that the defined property can have.
  These examples MUST all be of a data type that matches the :field:`type` field and otherwise adhere to the rest of the Property Description.

- :field:`x-optimade-property`: Dictionary.
  A dictionary of OPTIMADE-specific information about the defined property (i.e., the JSON Schema standard does not define these fields).
  The dictionary could contain the following OPTIONAL keys:

  **OPTIONAL keys:**

  - :field:`uri`: String.
    A URI that SHOULD resolve to a JSON response that contains the present Property Definition.
    The format of the response MUST be that of an `OPTIMADE Entry Listing Info Endpoint <Entry Listing Info Endpoints>`_, except the only mandatory keys are :field:`data` with the subfield :field:`properties`, but the response MAY contain other fields.
    This format makes it possible for a Property Definition in an Entry Listing Info endpoint to set :field:`x-optimade-property-uri` as a link back to the same Info endpoint.
    However, the field MAY also link to a resource external from the implementation to clarify to clients that a property used in several databases represents the same thing.

  - :field:`unit`: Dictionary.
    If the :field:`unit` field is omitted the value is interpreted as unitless e.g., a string representing a chemical formula or an integer counting the number of atoms in the unit cell.
    The :field:`unit` field is a dictionary that has the following format:

    **REQUIRED keys:**

    - :field:`symbol`: String.
      The string is a (compound) symbol for the physical unit in which the value of the defined property is given, if it uses a unit.
      The default set of unit symbols for OPTIMADE are taken from version 3.09 unit database `definition.units` from `GNU units software <https://www.gnu.org/software/units/>`__.
      If the unit is available in this database, or if it can be expressed as a compound unit expression using these units, the value of :field:`unit` SHOULD be set to the corresponding (compound) string symbol and no further definition be given.
      See subsection `Compound Units Expression in Property Definitions`_ for the details on how compound units are represented in OPTIMADE Property Definitions and the precise format of this string.

      If a (compound) symbol based on the GNU units database is not used, the string in :field:`unit` MUST be defined in the :field:`unit-definition` field inside the :field:`x-optimade-property` field of the Property Definition.

    **OPTIONAL keys:**

    - :field:`uri`: String.
      This field is a (unique) URI that can be used as a reference to the unit definition.

    - :field:`description`: String.
      A human-readable multiple-line description of the unit.

    - :field:`standard`: Dictionary.
      This field is used to define the unit symbol using a preexisting standard.
      The dictionary has the following format:

      **REQUIRED keys:**

      - :field:`name`: String.
        The abbreviated name of the standard being referenced.
        One of the following:

        - :val:`"gnu"`: the symbol comes from unit database `definition.units` from `GNU units software <https://www.gnu.org/software/units/>`__.
        - :val:`"ucum"`: the symbol comes from `The Unified Code for Units of Measure <https://unitsofmeasure.org/ucum.html>`__ (UCUM) standard.
        - :val:`"qudt"`: the symbol comes from the `QUDT <http://qudt.org/>`__ standard.
          Not only symbols strictly defined within the QUDT standard are allowed, but also other symbols constructed by following the scheme for how new unit symbols are formed in this standard.

      - :field:`version`: String.
        The version string of the referenced standard.

    - :field:`resources`: List.
      A list of dictionaries that references remote resources that describe the unit.
      The format of each dictionary is:

      **REQUIRED keys:**

      - :field:`relation`: String.
        A human-readable description of the relationship between the unit and the remote resource, e.g., a "natural language description".

      - :field:`uri`: String.
        A URI of the external resource.

  - :field:`resources`: List.
    A list of dictionaries that references remote resources that describe the property.
    The format of each dictionary is:

    **REQUIRED keys:**

    - :field:`relation`: String.
      A human-readable description the relationship between the property and the remote resource, e.g., "natural language description".

    - :field:`uri`: String.
      A URI of the external resource.

- :field:`x-optimade-support`: Dictionary.
  A dictionary of OPTIMADE-specific information about the defined property (i.e., the JSON Schema standard does not define these fields).
  The dictionary has the following format:

  - :field:`support`: String
    The string MUST be one of the following:

    - "must": the defined property MUST be recognized by the implementation (e.g., in filter strings) and MUST not be :val:`null`.
    - "should", the defined property MUST be recognized by the implementation (e.g., in filter strings) and SHOULD not be :val:`null`.
    - "may": the defined property MAY not be recognized by the implementation and MAY be equal to :val:`null`.

    Note: the specification in this field of whether the defined property can be :val:`null` or not MUST match the value of the :field:`type` field outside of :field:`x-optimade-property`.
    If :val:`null` values are allowed, that field must be a list where the string :val:`"null"` is the second element.

  - :field:`deprecated`: Boolean.
    If :val:`TRUE`, implementations SHOULD not use the defined property, and it MAY be removed in the future.
    If :val:`FALSE`, the defined property is not deprecated.
    The field not being present means :val:`FALSE`.

  - :field:`queryable`: String.
    The string MUST be one of the following:

    - :val:`"full"`: the defined property MUST be queryable using the OPTIMADE filter language with support for all mandatory filter features.
    - :val:`"partial:only full string"`: the defined property is a string that MUST be queryable using the OPTIMADE filter language. However, support for the partial string matching operators are OPTIONAL (i.e., the operators BEGINS WITH, ENDS WITH, and CONTAINS).
    - :val:`"partial"`: the defined property MUST be queryable, but the support MAY be restricted to any subset of the filter language operators and features.
      The level of support is described in the field :field:`query-description`.
    - :val:`"none"`: the defined property MAY not be queryable at all.

  - :field:`query-description`: String.
    This string is a human-readable multi-line text that describes the limitations of support of the OPTIMADE filter language.
    It SHOULD only be given if the field :field:`queryable` is set to the string :val:`"partial"`.
    Formatting in the text SHOULD use Markdown in the `CommonMark v3.0 format <https://spec.commonmark.org/0.30/>`__.

  - :field:`sortable`: Boolean.
    If :val:`TRUE`, specifies that results can be sorted on this property (see `Entry Listing URL Query Parameters`_ for more information on this field).
    If :val:`FALSE`, specifies that results cannot be sorted on this property.
    The field not being included means the same as :val:`FALSE`.

The part which should stay in the appendix:

Property Definitions
--------------------

Mapping between OPTIMADE data types and Property Definitions data types
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

  - **dictionary**: :val:`"object"`
  - **list**: :val:`"array"`
  - **string**: :val:`"string"`
  - **boolean**: :val:`"boolean"`
  - **integer**: :val:`"integer"`
  - **float**: :val:`"number"`
  - **timestamp**: :val:`"string"` with the "date-time" value in its :field:`format` field.


The OPTIONAL keys for specific data types of Property Definitions 
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

- "object":

  - :field:`required`: List.
    The list MUST only contain strings.
    The defined property MUST have keys that match all the strings in this list.
    Other keys present in the :field:`properties` field are OPTIONAL in the defined property.
    If not present or empty, all keys in :field:`properties` are regarded as OPTIONAL.

  - :field:`maxProperties`: Integer.
    The defined property is a dictionary where the number of keys MUST be less than or equal to the number given.

  - :field:`minProperties`: Integer.
    The defined property is a dictionary where the number of keys MUST be greater than or equal to the number given.

  - :field:`dependentRequired`: Dictionary.
    The dictionary keys are strings and the values are lists of unique strings.
    If the defined property has a key that is equal to a key in the given dictionary, the defined property MUST also have keys that matches each of the corresponding values.
    No restriction is inferred from this field for keys in the defined property that do not match any key in the given dictionary.

- "array":

  - :field:`maxItems`: Integer.
    A non-negative integer.
    The defined property is an array that MUST contain a number of items that is less than or equal to the given integer.

  - :field:`minItems`: Integer.
    A non-negative integer.
    The defined property is an array that MUST contain a number of items that is greater than or equal to the given integer.

  - :field:`uniqueItems`: Boolean.
    If :val:`TRUE`, the defined property is an array that MUST only contain unique items.
    If :val:`FALSE`, this field sets no limitation on the defined property.

- "integer":

  - :field:`multipleOf`: Integer.
    An integer is strictly greater than 0.
    The defined property MUST have an integer value that when divided by the given integer results in an integer (i.e., it must be even divisible by this integer without a fractional part).

  - :field:`maximum`: Integer.
    The defined property is an integer that MUST be less than or equal to this number.

  - :field:`exclusiveMaximum`: Integer.
    The defined property is an integer that MUST be strictly less than this number; it cannot equal the number.

  - :field:`minimum`: Integer.
    The defined property is an integer that MUST be greater than or equal to this number.

  - :field:`exclusiveMinimum`: Integer.
    The defined property is an integer that MUST be strictly greater than this number; it cannot equal the number.

- "number":

  - :field:`maximum`: Float.
    The defined property is a float that MUST be less than or equal to this number.

  - :field:`exclusiveMaximum`: Float.
    The defined property is a float that MUST be strictly less than this number; it cannot equal the number.

  - :field:`minimum`: Float.
    The defined property is a float that MUST be greater than or equal to this number.

  - :field:`exclusiveMinimum`: Float.
    The defined property is a float that MUST be strictly greater than this number; it cannot equal the number.

- "string":

  - :field:`maxLength`: Integer.
    A non-negative integer.
    The defined property is a string that MUST have a length that is less than or equal to the given integer.
    The length of the string is defined according to :RFC:`8259`.

  - :field:`minLength`: Integer.
    A non-negative integer.
    The defined property is a string that MUST have a length that is less than or equal to the given integer.
    The length of the string is defined according to :RFC:`8259`.

  - :field:`enum`: List.
    The defined property MUST take one of the values given in the provided list.
    The items in the list MUST all be of a data type that matches the :field:`type` field and otherwise adhere to the rest of the Property Description.
    If this key is given, for :val:`null` to be a valid value of the defined property, the list MUST contain a `null` value and the :field:`type` MUST be a list where the second value is the string "null".

  - :field:`pattern`: String.
    The defined property MUST adhere to the format as specified by the given string when interpreted as a regular expression according to the ECMA-262 regular expression dialect.

  - :field:`format`: String.
    Choose one of the following values to indicate that the defined property is a string that MUST adhere to the specified format:

    - "date-time": the date-time production in :RFC:`3339` section 5.6.
    - "date": the full-date production in :RFC:`3339` section 5.6.
    - "time": the full-time production in :RFC:`3339` section 5.6.
    - "duration": the duration production in :RFC:`3339` Appendix A.
    - "email": the "Mailbox" ABNF rule in :RFC:`5321` section 4.1.2.
    - "uri": the extended "Mailbox" ABNF rule in :RFC:`6531`, section 3.3.


Compound Units Expression in Property Definitions
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Compound unit expressions are created by combining unit symbols defined in the GNU units database, optionally prefixed by one or more of the prefixes defined in the database (indicated thereby trailing `-`), multiplication `*`, division `/`, and power `^` directly followed by a positive integer.
Furthermore:

- No whitespace, parenthesis, or other symbols than specified above are permitted.
- If multiple string representations of the same unit exist in `definition.units`, the *first one* in that file consisting of only lowercase letter and underscore characters SHOULD be used.
- The unit symbols MUST appear in alphabetical order.
- Consecutive divisions, e.g., :val:`a/b/c` are interpreted separately, i.e., :val:`b` and :val:`c` are both interpreted to be to the power of -1.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@rartino
Copy link
Contributor Author

rartino commented Jul 19, 2021

@fekad, @ml-evs Great to see feedback on this, many thanks for the careful reviews. I'll comment/resolve above. I may occasionally use a trick I learned from @ml-evs and edit your comments to adjust suggested code changes before merging (so this is a warning that others may need to look at comment edit history to get the full picture.)

It may take me a few days to get through this. Of course feel free to comment as I go along, but I'll re-request reviews when I'm fully done with your batches of comments.

There are a few things from @fekad that affects multiple comments, so I'll take them here.

  • Regarding including or omitting deprecated, support, query, and sortable

    Note that we intend to use these property definitions a bit beyond what schemas are typically used for. We'd like them to specify for future implementations what MUST, SHOULD, and MAY be supported for an implementation to be conformant, just as we presently do in human-readable text in section 7. Hence, it is not possible to say that one can "test" the behavior of an implementation to find out what applies. The fields are there for us (and others) to inform future implementations about how they must behave in relation to these properties. That said, maybe there is an issue with the use of language here? Should we call this thing a property specification rather than a property definition? Should we even divide the information into two separate parts, a property definition and a specification, and move these fields out to the latter?

  • Regarding 'non-Inner property definitions' and x-optimade-unit

    The language can probably be improved to make this distinction more clear; but the intent is that x-optimade-defintion only appears in the outermost layer of the property definition. x-optimade-unit, however, needs to be possible to specify on any level, which is why it separated out from the rest. This distinction between an outermost definition with a single x-optimade-definition that defines the units used, and a separate field x-optimade-unit that can appear on any level to specify the unit for that quantity seemed the most straightforward design in the end. Feel free to suggest alternatives, but perhaps preferably as a main conversation comment that explains how to address all aspects of this design issue.

  • Position of the "Property definition" section

    The placement inside an Appendix wasn't so much about importance, but more an attempt to make it clear that the property definition format is fairly free-standing. Nevertheless, I suppose its role in the standard is comparable with the query language, which has its own main section - so maybe this should be a new main section between 5 and 6? I disagree making it a subsection of the present section 5 API Endpoints, since it has broader relevance.

    Note: I see now that while I wrote this up you have posted a suggestion in dividing up the section between 5.2 and an Appendix. It isn't clear to me that this division improves things over keeping the full definition of the format together, but I'll think it over.

@rartino
Copy link
Contributor Author

rartino commented Jul 19, 2021

@fekad A few more comments on your latest compound edit suggestion:

  • If I read your version correctly, neither title nor description are now required on the outermost level of the property definition. Shouldn't they be mandatory? It is going to be messy on some client UI implementations to deal with type-only definitions.

  • Likewise, x-optimade-property and x-optimade-definitions can now appear on any level, and from an implementer's perspective I'm skeptical about that. It means that I need to handle, e.g., new unit definitions (e.g., the meaning of "angstrom") on any level of the property definition, possibly re-defining units defined differently elsewhere in the same property...

  • The separation into x-optimade-property and x-optimade-definitions seems to be more or less the separation into "specification" vs "definition" I also landed in from your other comments, although with reverse naming "definition" for what I consider "specification". One could consider separating them even further - the outermost level of a property specification could be a dictionary where the "definition" key is the JSON-schema-compatible part, and the other info goes under separate keys. However, that would destroy the fact that the properties dictionary itself is more or less on OpenAPI/JSON-schema form.

    In any case we should be careful with the naming so the categorization is clear. If we stick without another layer of dictionary, how about x-optimade-definitions -> x-optimade-support, and then rename the support subfield to level? Also, can we have a deprecated field here?

Co-authored-by: Matthew Evans <7916000+ml-evs@users.noreply.github.com>
merkys
merkys previously approved these changes Jun 6, 2022
Copy link
Member

@merkys merkys left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am fine with the PR as it is now. Surely language and formatting may be further polished, but I do not think these minor tweaks have to block the acceptance.

vaitkus
vaitkus previously approved these changes Jun 6, 2022
Copy link
Contributor

@vaitkus vaitkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good. The string length definition still reads a bit ambiguous to me, but surely this should not be viewed as a blocking issue.

optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated

- :field:`maxLength`: Integer.
A non-negative integer.
The defined property is a string that MUST have a length that is less than or equal to the given integer. (The length of the string is the number of individual characters it is composed from.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the new explanation of the string length unambiguous when used with Unicode? It really depends on how we define a character and on which Unicode normalization form in used (e.g., does "ṻ" which consists of the base character "u", a macron and a diaeresis count as 1, 2 or 3 characters?).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suppose that is what the link in JSON Schema to RFC8259 was meant to clarify, but quickly flipping through that RFC, I'm not sure there is more of a clarification than "number of unicode characters". But I think that means your symbol counts as three unicode characters (?). Hmm... I think we should also add "unicode" to our text.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using google for clarification, I note that we are not the only ones confused how to actually read the JSON schema specification: https://stackoverflow.com/a/20791787/1558655 - note particularly the final comment.

However, the JSON API specification says "the number of Unicode characters" and, referencing Wikipedia: https://en.wikipedia.org/wiki/List_of_Unicode_characters it seems Diaeresis and Macron modifiers count as "characters".

In any case, in this ambiguity we probably should stay as close as possible to the JSON Schema phrasing so that it is clear that we mean the same thing they mean, whatever that is.

optimade.rst Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Show resolved Hide resolved
…view

Co-authored-by: Antanas Vaitkus <antanas.vaitkus90@gmail.com>
@rartino rartino dismissed stale reviews from vaitkus and merkys via c5d943c June 6, 2022 20:59
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
optimade.rst Outdated Show resolved Hide resolved
@rartino rartino requested review from vaitkus and merkys June 6, 2022 21:53
Copy link
Contributor

@vaitkus vaitkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I approve the changes in case Unicode normalization is not something that we want to deal right now.

- :field:`maxLength`: Integer.
A non-negative integer.
The defined property is a string that MUST have a length that is less than or equal to the given integer.
(The length of the string is the number of individual Unicode characters it is composed from.)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Individual Unicode characters is probably also not sufficient, since what is considered an individual character depends on the Unicode normalization form that is used to represent the string. For example, letter "ž" (as in "Gražulis"), can be represented both as a single codepoint U+017E ("Latin Small Letter Z with Caron") under the NFC normalization and as a combination of two codepoints U+007A, U+030C ("Latin Small Letter Z" and "Combining Caron") under the NFD normalization.

RFC 4329 that covers the JavaScript and ECMAScript media types contains the following lines that show preference to the NFC normalization:

Source text is expected to be in Unicode Normalization Form C. Scripts and implementations MUST consider security implications of unnormalized source text and data.

However, neither RFC 8259 (JSON), nor the JSON Schema specification are explicit about this, although these seems to be at least one similar string-length related issue raise in the JSON Schema repo (json-schema-org/json-schema-spec#215). Also, note that Unicode string normalization is relevant not only when determining the length of the string, but also when comparing two strings for equality, etc. so it might be useful to eventually define this more explicitly in other contexts as well. For now, maybe something like this could work?

The length of the string is the number of individual Unicode characters the string consists of when presented in the Unicode Normalization Form C.

I am not necessarily advocating for Normalization Form C, but rather using it as a reasonable placeholder.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, maybe something like this could work?

The length of the string is the number of individual Unicode characters the string consists of when presented in the Unicode Normalization Form C.

I am not necessarily advocating for Normalization Form C, but rather using it as a reasonable placeholder.

I don't see how it is an option for us to be less ambiguous than the standard we aim to be compatible with. If we clarify in this direction and JSON Schema ends up going in a different direction, the compatibility is broken. I think the right course of action at this point is to adopt the unclear formulation from the current JSON Schema draft standard and - if someone of us is prepared to do it - contact them and ask them to clarify.

Copy link
Contributor Author

@rartino rartino Jun 7, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(@vaitkus if you are ok with the present situation, please do an actual review ("Files changed" tab -> top right "Review changes" and "Approve", so we get the approvals to merge; thanks.)

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaitkus might not be among the "owners" of the repository, thus his approval might not be included in the count.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see how it is an option for us to be less ambiguous than the standard we aim to be compatible with. If we clarify in this direction and JSON Schema ends up going in a different direction, the compatibility is broken. I think the right course of action at this point is to adopt the unclear formulation from the current JSON Schema draft standard and - if someone of us is prepared to do it - contact them and ask them to clarify.

I do not think that being less ambiguous makes us incompatible -- schema representation remains the same, but we simply explicitly clarify how we interpret "string length" in the scope of the OPTIMADE project. Without a uniform definition, different implementation of the validator are open to return different (conflicting) results.

Also, I think that eventual incompatibility with the JSON Schema is somewhat inevitable. Property definitions are currently based on a fixed JSON Schema version and once a new JSON Schema version is released (currently only ~30% complete), a conscious decision will have to be made to migrate to it taking into account all the potential incompatibilities.

As for the actual review, I though that I actually went through the approval steps that you described. It might be the ownership issue that @merkys mentioned, but I will try again just in case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@vaitkus

Also, I think that eventual incompatibility with the JSON Schema is somewhat inevitable. Property definitions are currently based on a fixed JSON Schema version and once a new JSON Schema version is released (currently only ~30% complete), a conscious decision will have to be made to migrate to it taking into account all the potential incompatibilities.

Is it naive to hope that future revisions of JSON Schema will keep our rather small subset intact to remain backwards compatible? But you have a very good point here: how can we avoid breaking clients if JSON Schema makes a backwards-incompatible change. We probably should add a machine-readable reference to the JSON Schema version somewhere in this. Perhaps in x-optiomade-property in the outermost property definition? Can we be sure at least that much will remain intact?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would be a useful property. The JSON Schema people seem to properly version their releases and even provide in-depth descriptions on how the new version differs from the previous one (e.g., https://json-schema.org/draft/2020-12/release-notes.html).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll try to draft a PR for this.

Copy link
Contributor

@vaitkus vaitkus left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying again to approve the PR.

@rartino rartino merged commit be528af into Materials-Consortia:develop Jun 7, 2022
@merkys
Copy link
Member

merkys commented Jun 7, 2022

Glad to see this finally merged. Thanks @rartino et al. for putting it together!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
PR/ready-for-review Add this flag if you are the author of the PR and you want it to be reviewed. Remove it when editing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants