Skip to content
Tiffany J. Callahan edited this page Nov 6, 2020 · 17 revisions

Release: v1.0 (first official release)


Release Updates


New Data Sources:

  • The original set of ontologies has been extended, see the Ontologies section for more information
  • The National of Library Medicine's Unified Medical Language System (UMLS) MRCONSO and MRSTY. Using these data requires a NLM UMLS license agreement

Featured Functionality:

Release Data:

  • All data used for this release can e downloaded directly from Zenodo (here)

Screen Shot 2020-10-14 at 20 04 53



Jupyter Notebooks



Ontologies


Downloaded Resource Information:


The specific ontologies used in this release of OMOP2OBO, including class and axiom counts, are shown in the table below. All ontologies were downloaded and processed on 09/14/20.

Ontology Classes Definitions Labels Synonyms DbXRefs
Cell Line Ontology (CL) 2,238 1,859 2,238 2,124 1,376
Chemical Entities of Biological Interest (CHEBI) 126,169 48,824 126,169 269,798 231,247
Human Phenotype Ontology (HPO) 15,247 12,468 15,247 19,860 19,569
Mondo Disease Ontology (MONDO) 22,288 15,271 22,288 98,181 159,918
NCBITaxon Organism Taxonomy (NCBITaxon) 2,241,110 0 2,241,110 263,571 18,426
Protein Ontology (PRO) 215,624 215,598 215,624 590,190 195,671
Uber-Anatomy Ontology (UBERON) 13,898 11,026 13,898 36,771 51,322
Vaccine Ontology (VO) 5,783 1,231 5,789 6 0

Ontology Metadata

A Chi-Square test of independence was run to determine if the amount of metadata available differed by ontology. First, an omnibus test was run to determine whether there was a significant relationship between the metadata and ontologies. Results from this test (with Yate's correction) revealed a significant association between the ontology metadata and ontology type (X2(14)=2,664,853.817, p<0.0001). In order to better understand these findings, post-hoc tests were run using a Bonferroni adjustment to correct for multiple comparisons. These tests confirmed all ontologies had significantly different distributions of metadata (ps<0.0001).

Screen Shot 2020-11-05 at 16 41 37



Clinical Data


This section provides an overview of the clinical data available for mapping. To create the mappings, clinical data was pulled in two waves from an OMOP (v5.0) PEDSNet (v3.0)-normalized instance of Children's Hospital of Colorado data (#15-0445).


Conditions

Wiki Page: Conditions

SQL Queries

Data was pulled in two waves. The first waved returned all condition concepts ids used at least 1 time in practice (n=29,129). The second wave returned all standard SNOMED-CT concept ids not used in practice (n=109,719). Once the 29,129 concepts used in practice were removed, there 80,590 were standard SNOMED-CT concepts that had not been used in practice.

CONCEPT LEVEL CODES LABELS SYNONYMS VOCABULARIES
Concepts Used In Practice
Concept 29,129 29,129 86,630 SNOMED-CT
Ancestor 1,421,104 1,389,525 N/A SNOMED-CT
Cohort
MedDRA
Standard SNOMED-CT Concepts Not Used In Practice
Concept 80,590 80,590 194,264 SNOMED-CT
Ancestor 3,458,072 3,393,343 N/A SNOMED-CT
Cohort
MedDRA




Drug Exposure Ingredients

Wiki Page: Drug Exposure Ingredients

SQL Queries

  • Drug Exposure Ingredients Used in Practice (GitHub Gist)
  • Standard RxNorm Drug Ingredients Concepts (GitHub Gist)

Data was pulled in two waves. A total of 56,200 drug-ingredient concepts were eligible for mapping (51,941 drugs; 11,807 ingredients). The first waved returned all drug concepts ids used at least 1 time in practice (9,175 drugs; 1,697 ingredients). The second wave returned all standard drug concepts ids from RxNorm (42,766 drugs; 10,110 ingredients).

DATA TYPE CONCEPT LEVEL CODES LABELS SYNONYMS VOCABULARIES
Concepts Used In Practice
Drugs Concept 9,175 9,154 19,496 RxNorm
Ancestor 140,937 77,135 N/A SPL
Cohort
ATC
NDFRT
RxNorm
VA Class
CVX
Ingredients Concept 1,697 1,696 1,868 RxNorm
SPL
Ancestor 1,697 1,696 N/A RxNorm
SPL
Standard RxNorm Concepts Not Used In Practice
Drugs Concept 42,766 42,640 52,688 RxNorm
Ancestor 68,343 64,212 N/A SPL
Cohort
ATC
NDFRT
RxNorm
VA Class
CVX
Ingredients Concept 10,110 10,110 11,235 RxNorm
Ancestor 10,578 10,578 N/A RxNorm




Measurements

Wiki Page: Measurements

SQL Queries

  • CHCO Measurements Used in Practice (GitHub Gist)
  • Standard LOINC2HPO Concepts Not Used In Practice (GitHub Gist)

Data was pulled in two waves. The first wave of data was pulled from CHCO and like with the condition and measurement domains, contains only those concepts that were used at least once in clinical practice. This set contained a total of 1,606 LOINC concepts or 4,425 lab test results (more information on how lab test results were identified below). The initial set of CHCO data were supplemented by adding the latest LOINC2HPO annotations. The current annotation set (annotations.tsv; last updated 06/07/2020) was downloaded from the develop branch of the LOINC2HPO GitHub repository on 08/12/2020). Of the 3,119 unique codes obtained from LOINC2HPO (7,421 unique results), 631 overlapped with the OMOP measurement terms retrieved from CHCO and were excluded. An additional 11 concepts were excluded due to being deprecated. This final set of terms was further processed to remove terms with duplicate result types (n=19 concepts). The final set of processed terms included 2,477 unique LOINC concepts or 6,844 lab test results.

Identifying LOINC Scale and Result Type
All lab test scale types (i.e. ordinal, nominal, quantitative, qualitative, narrative, doc, and panel) were initially eligible to be mapped. The scale type of each lab test was identified by parsing the free-text text in the concept synonym field for the presence of any of the scale types listed above. Result type was identified using a two-step approach. First, we analyzed the reference ranges available in the patient data. If at least one numeric result was reported the result type was recorded as Normal/Low/High and if a positive or negative result was reported it was recorded as Positive/Negative. Then, for all lab tests without a reference range in the data, the result type was obtained by parsing the free-text in the concept synonym field. For all tests with an ordinal scale type, if the keywords presence or screen were identified, the result type was reported as Positive/Negative. All tests with a quantitative scale type were given the result type Normal/Low/High. All other scale types were annotated with Unknown Result Type.

CONCEPT LEVEL CODES LABELS SYNONYMS VOCABULARIES
Concepts Used In Practice
Concept 1,606 1,606 41,981 LOINC
PEDSnet
Ancestor 20,781 21,191 N/A LOINC
LOINC2HPO Concepts Not Used In Practice
Concept 2,477 2,477 73,612 LOINC
Ancestor 23,457 24,306 N/A LOINC

OMOP2OBO Mapping Sets


Required Mapping Data:

Conditions

Drug Exposure Ingredients

Measurements


OMOP2OBO Mapping Validation


Accuracy
Validation work performed in order to demonstrate the accuracy of the OMOP2OBO mappings. This work was specifically designed to verify the accuracy of manually constructed mappings (i.e. mappings that were not created from automatic alignment of existing database cross-references or exact string mappings). A subset of the most difficult manual and manual constructor mappings were randomly selected and verified by members of the clinical team shown below. Please see the Accuracy Wiki for additional information.

Consistency
Validation work performed in order to demonstrate the logical consistency of the OMOP2OBO mappings. For additional information on how we creates semantic representations of the OMOP2OBO mappings see this wiki page. The experiment described below was designed to assess the semantic representation of the mappings. Please see the Consistency Wiki for additional information.

Generalizability
Validation work aimed at evaluating and characterizing the generalizability or coverage of the OMOP vocabulary terms included in the OMOP2OBO mapping set to OMOP vocabulary terms utilized in the Observational Health Data Sciences and Informatics (OHDSI) Concept Prevalence study sites. Please see the Generalizability Wiki for additional information.




Return to Top



This project is licensed under MIT - see the LICENSE.md file for details. If you intend to use any of the information on this Wiki, please provide the appropriate attribution by citing this repository:

@misc{callahan_tj_2020_4247939,
  author       = {Callahan, TJ},
  title        = {OMOP2OBO},
  month        = jun,
  year         = 2021,
  doi          = {10.5281/zenodo.4247939},
  url          = {https://doi.org/10.5281/zenodo.4247939}
}
Clone this wiki locally