Skip to content
Tiffany J. Callahan edited this page Jan 13, 2018 · 19 revisions

SemRepRDF

This Wiki documents our process of transforming the National Library of Medicine's Semantic Knowledge Representation predications into an open linked data resource. This work was developed on and presented at the 4th Annual Biomedical Linked Annotation Hackathon (BLAH4), held in Kashiwa Japan (January, 2018).

Proposal presentation

Motivation and Background

Sources of “big” biomedical data like electronic health records (EHRs), high-throughput experiments, and Internet of Things devices provide researchers and clinicians with unprecedented opportunities for scientific advancement (Piai et al., 2013). Unfortunately, to fully utilize these data researchers must face the formidable challenge of synthesizing relevant information from an exponentially expanding body of scientific literature (Sinoara et al., 2017, Simmons et al., 2017). To help solve this problem, the natural language processing and biomedical research communities have developed rigorous algorithms resulting in the generation of impressive collections of annotated text corpora. While the breadth of concept annotations in existing corpora is extensive, large-scale annotation of relations between annotated concepts is often limited or incomplete (Neves et al., 2014). With this in mind, we propose to extend the coverage of existing annotations in PubAnnotation by transforming the National Library of Medicine’s Semantic Representation (SemRep) predications into semantically-linked annotations.

Proposed Work

Given the size (~91 million predications [subject-predicate-object]) and coverage (26.7 million citations) of SemRep, when mapped to ontologies and integrated with existing projects in PubAnnotation (especially those that include concepts not represented in the UMLS), it has great potential to be a very valuable resource to the community.

  • Refine, extend, and implement the schema for representing SemRep predications (including the representation of annotation source provenance and/or metadata). This representation will be developed to ensure compatibility with existing PubAnnotation projects. See Figure 1 for modified schema (01/13/18).
  • Review UMLS licensing and Terms of Use. Generate a plan to create an open version of the SemRep annotations, without violating the UMLS license agreement.
  • Identify open resources and ontologies to map to existing annotations.
  • Document the transformation process, including all discussions on GitHub.
  • Create an RDF version of the transformed SemRep annotations that can be made publicly available for download.

Figure 1. Draft of proposed schema to make SemRep Predicates compatible with PubAnnotation

Prior Work

There have been a few prior efforts aimed at converting SemRep predications into linked data, but, most of these efforts were intentionally designed to convert only small subsets of the full database (OMOP RDF, Zhang et al., 2014, Zhang et al., 2013). To the best of our knowledge, no existing efforts have converted the full set of predications with the intention of integrating the resulting annotations into an existing public repository.

SemRep Project Details

Specific information regarding SemRep is detailed below:

  • The SemRep program is managed under the Semantic Knowledge Representation (SKR) project and is maintained by research staff at the National Library of Medicine (NLM).
  • SemRep predications are generated using MetaMap. When building the predications, subjects and objects concepts are taken from the UMLS Metathesaurus, while relations are taken from the UMLS Semantic Network.
  • We will create annotations using a downloaded version of the Semantic MEDLINE Database (SemMedDB). Currently, the database contains over 91 million predications generated from 26.7 million PubMed citations.
  • An example predication (taken from the online description of SemRep) is shown below:
    • Sentence: We used hemofiltration to treat a patient with digoxin overdose that was complicated by refractory hyperkalemia
    • SemRep Generated Predications:
      • Hemofiltration-TREATS-Patients 
      • Digoxin overdose-PROCESS_OF-Patients 
      • hyperkalemia-COMPLICATES-Digoxin overdose 
      • Hemofiltration-TREATS(INFER)-Digoxin overdose
Clone this wiki locally