Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

generate domain/range validation report for SEMMEDDB #1292

Closed
sierra-moxon opened this issue Apr 25, 2023 · 4 comments
Closed

generate domain/range validation report for SEMMEDDB #1292

sierra-moxon opened this issue Apr 25, 2023 · 4 comments

Comments

@sierra-moxon
Copy link
Member

Using the files: https://lhncbc.nlm.nih.gov/ii/tools/MetaMap/documentation/SemanticTypesAndGroups.html
and
https://files.slack.com/files-pri/TSCGQ3XGB-F054B54G9TP/download/semmedver43_2023_r_predication.116080.metaedges.txt?origin_team=TSCGQ3XGB

produce an output file with the following header:
"SEMMEDDB_Subject", "SEMMEDDB_Predicate", "SEMMEDDB_Object", "BiolinkSubject", "BiolinkPredicate", "BiolinkObject", "Valid_Edge?", "Component Missing"

The mapping for subject, predicate, object in Biolink should be made independently, with a second check for whether this combination is a valid edge in biolink.

Biolink edge is valid when the code can find valid biolink classes for subject and object and a valid biolink association slot (which is also a child of related_to, aka: a biolink predicate), for the predicate, AND the domain and range constraints on the biolink predicate encompass the passed subject and object respectively.

For example:

SEMMEDDB_Subject        SEMMEDDB_Predicate      SEMMEDDB_Object BiolinkSubject  BiolinkPredicate        BiolinkObject   Valid_Edge?     Component Missing
dsyn    PROCESS_OF      humn    biolink:Disease biolink:occurs_in               False   humn
fndg    PROCESS_OF      humn    biolink:DiseaseOrPhenotypicFeature      biolink:occurs_in               False   humn
neop    PROCESS_OF      humn    biolink:Disease biolink:occurs_in               False   humn
topp    TREATS  podg    biolink:Procedure       biolink:treats  biolink:Cohort  False   None
patf    PROCESS_OF      humn    biolink:PathologicalProcess     biolink:occurs_in               False   humn
topp    USES    phsu    biolink:Procedure       biolink:has_input       biolink:Drug    False   None
topp    TREATS  dsyn    biolink:Procedure       biolink:treats  biolink:Disease False   None
cell    LOCATION_OF     aapp    biolink:Cell    biolink:location_of     biolink:Polypeptide     True    None
mobd    PROCESS_OF      humn    biolink:Disease biolink:occurs_in               False   humn
bpoc    PART_OF mamm    biolink:GrossAnatomicalStructure        biolink:part_of         False   mamm
sosy    PROCESS_OF      humn    biolink:PhenotypicFeature       biolink:occurs_in               False   humn

for the first row - the output is saying this isn't a valid Biolink edge because there is no mapping in Biolink for STY:Txxx humn (I imagine this is "human").

for this row:

topp    TREATS  podg    biolink:Procedure       biolink:treats  biolink:Cohort  False   None

we're saying that we can match everything in the row (there are no missing components), but its not a valid edge in biolink. For the treats predicate, the domain and range are as follows:

domain: chemical or drug or treatment
range: disease or phenotypic feature
@sierra-moxon sierra-moxon self-assigned this Apr 25, 2023
@sierra-moxon
Copy link
Member Author

sierra-moxon commented Apr 25, 2023

code to produce this report is here: github.com/sierra-moxon/predicates-analysis/

first draft output for this is here:
semmeddb_biolink_triples.tar.gz

@sierra-moxon
Copy link
Member Author

per feedback from Andrew, added back in counts, reordered columns:
semmeddb_biolink_triples.tar.gz

@sierra-moxon
Copy link
Member Author

sierra-moxon commented Apr 27, 2023

next iteration, per feedback from Matt:
semmeddb_biolink_triples.tsv.tar.gz

@sierra-moxon
Copy link
Member Author

report was presented and accepted; relay session to follow but for now this work is done. closing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

1 participant