Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Files managed in OBO format have replaced by values as CURIE strings #642

Closed
matentzn opened this issue Aug 4, 2022 · 11 comments
Closed
Assignees
Milestone

Comments

@matentzn
Copy link
Contributor

matentzn commented Aug 4, 2022

We should try and make sure they are correctly rewritten to URIs before moving out. Maybe use the new intermediate call #639

see monarch-initiative/mondo#2636

@gouttegd
Copy link
Contributor

gouttegd commented Aug 4, 2022

Any chance we could fix the OBO parser to recognise a CURIE value in a replaced_by tag, and replace it by a proper IRI value in the parsed ontology?

OBO files would then still use CURIEs, but they would automatically become IRIs upon transformation into any other format, without requiring any special processing at the ODK level.

@balhoff
Copy link
Member

balhoff commented Aug 4, 2022

I've been keeping some notes on things we could clean up in the OBO parser. I would love if there were a syntactic difference between CURIEs and full IRIs so that either could be used, and a way to define prefixes. Maybe some interested folks could join a working group to create a revision.

@gouttegd
Copy link
Contributor

gouttegd commented Aug 4, 2022

@balhoff Not sure I understand why we would need a “syntactic difference between CURIs and full IRIs“?

Doesn’t the loadOboToIRI method in owlapi-oboformat already do everything that we need here? It recognises a “full IRI” as well as a “Prefixed ID” (i.e. a CURIE) and resolves the latter into a proper IRI. What else would we need?

@balhoff
Copy link
Member

balhoff commented Aug 4, 2022

I need to go back and look at some of the specific cases, but the OBO parser makes some wrong assumptions in various situations. In every other OWL format a full IRI is surrounded by <...> and a CURIE is un-bracketed with a possibly empty prefix string followed by a colon followed by a possibly empty local ID. There shouldn't be a need for the parser to guess.

@balhoff
Copy link
Member

balhoff commented Aug 4, 2022

Also—there is no way to define non-OBO namespace CURIEs, which is pretty limiting.

@matentzn
Copy link
Contributor Author

matentzn commented Aug 5, 2022

I think in the interest of the future, it would be great to collect all obo-format problems together in some issue and organise a hackathon to fix the parsers, but in the interim, I would still like to implement a method to hack this. As I will now introduce formal contexts to each ontology repos (which contain prefix maps), my idea was to add a python script that reads the context and the intermediate ontology, replaces all prefixes that are known according to the context drop the rest with a warning.

@matentzn matentzn added this to the 1.3.2 milestone Aug 5, 2022
@gouttegd
Copy link
Contributor

gouttegd commented Aug 5, 2022

it would be great to collect all obo-format problems together in some issue and organise a hackathon to fix the parsers

Do we want to fix the parsers or the format?

To be clear, I was talking about the parser, but I read @balhoff ’s comments above as if he would like a new version of the OBO format.

@matentzn
Copy link
Contributor Author

matentzn commented Aug 5, 2022

I think its mostly about fixing the parser, but I think you need to add prefix maps to be able to curate non-OBO curies, for example in the replaced by section. Not sure of OBO format has that..

@gouttegd
Copy link
Contributor

gouttegd commented Aug 5, 2022

Not sure of OBO format has that..

I believe it does: the OBO Flat File Format specification describes a idspace header tag to map a prefix to a base URL.

The owlapi’s OBO parser recognises that tag but doesn’t seem to do anything with it.

I suspect that if we used the mappings declared in idspace clauses to populate the idSpaceMap hashmap (which is used here when converting a CURIE to a IRI), we may have all what we need for this particular issue.

@gouttegd
Copy link
Contributor

I don’t think there is anything left for the ODK to do here?

The OWLAPI OBO parser does expand CURIEs found in replaced_by tags since version 4.5.29, and therefore so does ROBOT since version 1.9.6.

@matentzn Anything else you wanted to be done as part of this issue?

@matentzn
Copy link
Contributor Author

Nope, this is all we needed! Thanks for cleaning out!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants