Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Issues with the specification “overview” #283

Closed
gouttegd opened this issue Jun 10, 2023 · 6 comments
Closed

Issues with the specification “overview” #283

gouttegd opened this issue Jun 10, 2023 · 6 comments

Comments

@gouttegd
Copy link
Contributor

gouttegd commented Jun 10, 2023

The Overview of the SSSOM specification suffers from some issues.

  1. It allows to overload the value of the curie_map parameter (in the YAML metadata block, when serialising in TSV) so that it can be “either dictionary of CURIE->URLPREFIX pairs or a link to a valid curie map of the same shape”. There is agreement on Slack that such overloading is not a good idea.

The document should be amended so that curie_map can only be a dictionary of {CURIE, URLPREFIX} pairs. If the ability to reference an external curie map is still desired, a new parameter should be introduced (for example, curie_map_reference, curie_map_link, or something like that).

(For what it’s worth, I am slightly against allowing a reference to an external curie map, but I don’t have a strong opinion on that issue. I do think allowing such references would make things more complicated that they need to be, but I can deal with that.)

  1. The document is inconsistent regarding the filename convention for the external metadata file. It first recommends .sssom.yml

External yaml metadata files should have the extension .sssom.yml, for example mp-hp-exact-0.0.1.sssom.tsv

(of note, there’s already an inconsistency here, as the given example has a .sssom.tsv extension), then it recommends -meta.yml

In external mode, the mapping set metadata is supplied by a separate YAML file having the same base-name of the mapping file, with the extension -meta.yml.

The real recommended filename should be .sssom.yml.

  1. The sample TSV file violates the stated requirement on column orders. The description of the TSV format says:

The columns MUST be sorted according to the order as they appear in the SSSOM metadata. For example, the first columns of a mapping set TSV should always be, in that order: subject_id, predicate_id, object_id, mapping_justification, if labels are not included; if they are included, the order should be: subject_id, subject_label, predicate_id, predicate_label, object_id, object_label, mapping_justification.

The example file uses the following ordering: subject_id, predicate_id, object_id, mapping_justification, subject_label, object_label.

This begs the question: is column ordering really important? If it is, then we should not provide an example that does not follow it. If it is not (e.g., if it’s only a recommendation, as row ordering seems to be), then the MUST should be relaxed at least into a SHOULD.

And speaking of MUST and SHOULD:

  1. Overall, the document would benefit from more abundant use of RFC 2119-style keywords (MAY, SHOULD, MUST, etc.), to clearly (and in a way that people reading specifications are familiar with) communicate which parts of the specification are recommendations, important requirements, critically important requirements, and so on.

For example:

  • “It will be required to supply a valid curie map that allows the unambiguous interpretation of CURIEs” -> “The YAML metadata block MUST contain a curie map that allows the unambiguous interpretation of CURIES”.
  • “In external mode, the mapping set metadata is supplied by a separate YAML file having the same base-name of the mapping file, with the extension .sssom.yml -> “In external mode, the mapping set metadata MUST be supplied in a separate YAML file; that file SHOULD have the same base name as the mapping file, with the extension .sssom.yml”.

etc.

@jamesamcl
Copy link
Member

Significant column ordering sounds dangerous as it enables/encourages consumers to ignore the headings!

@gouttegd
Copy link
Contributor Author

gouttegd commented Jun 10, 2023

Agree. And of note, sssom-py has seemingly no trouble manipulating the provided example file despite the “wrong” column ordering, suggesting this requirement is not warranted.

I think it should be downgraded to a SHOULD at least.

Of note, even if that requirement becomes a recommendation, the example file we provide should still follow it. Recommending a strict column order and then showing an example that does not respect it can give a bad first impression to anyone looking at the spec and trying to decide whether they want to support it or not.

@gouttegd
Copy link
Contributor Author

Another issue I noticed just now:

it will be required to supply a valid curie map that allows the unambiguous interpretation of CURIE

But the example file has an incomplete curie map that does not allow to resolve the skos: and semapv: prefixes, and yet sssom-py can process that file without even a warning.

This means sssom-py has built-in knowledge of (at least) skos: and semapv:, and allows those prefixes to be omitted.

If it is indeed the intention that people should be dispensed from having to declare those two prefixes when they create TSV files, this should be explicitly said in the specification.

(For what it’s worth I think an implementation should not rely on built-in knowledge of some prefixes.)

@gouttegd
Copy link
Contributor Author

gouttegd commented Jun 11, 2023

Another under-specified behaviour:

When serialising in OWL, sssom-py renders mappings that use a skos:(exact|narrow|broad|related|close)Match predicate as if the predicate was an annotation property. This is in agreement with how SEMAPV (re-)defines them.

But it is not obvious that those predicates should be treated as annotation properties. The OWL API for example considers that they are object properties, which is in agreement with how the SKOS specification explicitly defines them (“skos:mappingRelation, skos:closeMatch, skos:exactMatch, skos:broadMatch, skos:narrowMatch and skos:relatedMatch are each instances of owl:ObjectProperty”).

How the SKOS mapping properties are treated (as AP or as OP) matters very much in the context of SSSOM, since it decides how the mappings are translated into OWL axioms when serialising into OWL.

If the behaviour of sssom-py and of SEMAPV (treating the skos mapping properties as if they were annotation properties) is the intended one, then it should be explicitly stated in the specification:

For the purpose of serialising mappings into reified OWL axioms, mapping predicates from the http://www.w3.org/2004/02/skos/core# namespace MUST be considered to be OWL annotation properties, notwithstanding section 10.3 of the SKOS reference document.

Or, to allow for more instances of similar “reuse existing properties while changing their definition” behaviours beyond the case of SKOS:

For the purpose of serialising mappings into reified OWL axioms, mapping predicates that are present in the Semantic Mapping Vocabulary MUST be considered to be of the type defined in that vocabulary, even if the predicates are originally defined in another specification with a different type.

@matentzn
Copy link
Collaborator

@gouttegd I was going to address every point you made here in a separate comment, but it turns out I agree with all you say and hopefully apart from two things, have addressed everything in #285:

  1. The OWL question is a LinkML one I moved to Schema does not clearly prescribe how to interpret SSSOM in OWL #286 (smaller issues are easier for communication)
  2. I didn't proactively search for more opportunities to apply SHOULD/MUST language in the text (too open ended for my current schedule)

I applied canonical sort order (sssom sort set.sssom.tsv) to all example files, so that is that.

I don't like that the comment ends up on top of the curie_map but that's another issue :D

hrshdhgd added a commit that referenced this issue Jun 20, 2023
Partially adresses #283 

- [X] `docs/` have been added/updated if necessary
- [X]
[CHANGELOG.md](https://github.com/mapping-commons/sssom/blob/master/CHANGELOG.md)
has been updated.

- Applies canonical sort ordering to all examples files
- Downgrades the necessity of applying the canonical column order from
MUST to SHOULD
- Documents clearly that built-in prefixes MUST NOT be redefined

---------

Co-authored-by: Harshad Hegde <hegdehb@gmail.com>
@gouttegd
Copy link
Contributor Author

No longer relevant since #368.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants