V1.0

Release: v1.0 (first official release)

Release Updates

New Data Sources:

The original set of ontologies has been extended, see the Ontologies section for more information
The National of Library Medicine's Unified Medical Language System (UMLS) MRCONSO and MRSTY. Using these data requires a NLM UMLS license agreement

Featured Functionality:

To improve our mapping pipeline, we have created a Python-based version of Juan Banda's OHDSI Ananke

Release Data:

All data used for this release can e downloaded directly from Zenodo (here)

Jupyter Notebooks

Ontologies

Downloaded Resource Information:

The specific ontologies used in this release of OMOP2OBO, including class and axiom counts, are shown in the table below. All ontologies were downloaded and processed on 09/14/20.

Ontology	Classes	Definitions	Labels	Synonyms	DbXRefs
Cell Line Ontology (CL)	`2,238`	`1,859`	`2,238`	`2,124`	`1,376`
Chemical Entities of Biological Interest (CHEBI)	`126,169`	`48,824`	`126,169`	`269,798`	`231,247`
Human Phenotype Ontology (HPO)	`15,247`	`12,468`	`15,247`	`19,860`	`19,569`
Mondo Disease Ontology (MONDO)	`22,288`	`15,271`	`22,288`	`98,181`	`159,918`
NCBITaxon Organism Taxonomy (NCBITaxon)	`2,241,110`	`0`	`2,241,110`	`263,571`	`18,426`
Protein Ontology (PRO)	`215,624`	`215,598`	`215,624`	`590,190`	`195,671`
Uber-Anatomy Ontology (UBERON)	`13,898`	`11,026`	`13,898`	`36,771`	`51,322`
Vaccine Ontology (VO)	`5,783`	`1,231`	`5,789`	`6`	`0`

Ontology Metadata

A Chi-Square test of independence was run to determine if the amount of metadata available differed by ontology. First, an omnibus test was run to determine whether there was a significant relationship between the metadata and ontologies. Results from this test (with Yate's correction) revealed a significant association between the ontology metadata and ontology type (X2(14)=2,664,853.817, p<0.0001). In order to better understand these findings, post-hoc tests were run using a Bonferroni adjustment to correct for multiple comparisons. These tests confirmed all ontologies had significantly different distributions of metadata (ps<0.0001).

Screen Shot 2020-11-05 at 16 41 37

Clinical Data

This section provides an overview of the clinical data available for mapping. To create the mappings, clinical data was pulled in two waves from an OMOP (v5.0) PEDSNet (v3.0)-normalized instance of Children's Hospital of Colorado data (#15-0445).

Conditions

Wiki Page: Conditions

SQL Queries

Condition Concepts Used in Practice (GitHub Gist)
Standard SNOMED-CT Condition Concepts (GitHub Gist)

Data was pulled in two waves. The first waved returned all condition concepts ids used at least 1 time in practice (n=29,129). The second wave returned all standard SNOMED-CT concept ids not used in practice (n=109,719). Once the 29,129 concepts used in practice were removed, there 80,590 were standard SNOMED-CT concepts that had not been used in practice.

CONCEPT LEVEL	CODES	LABELS	SYNONYMS	VOCABULARIES
Concepts Used In Practice
Concept	29,129	29,129	86,630	SNOMED-CT
Ancestor	1,421,104	1,389,525	N/A	SNOMED-CT Cohort MedDRA
Standard SNOMED-CT Concepts Not Used In Practice
Concept	80,590	80,590	194,264	SNOMED-CT
Ancestor	3,458,072	3,393,343	N/A	SNOMED-CT Cohort MedDRA

Drug Exposure Ingredients

Wiki Page: Drug Exposure Ingredients

SQL Queries

Drug Exposure Ingredients Used in Practice (GitHub Gist)
Standard RxNorm Drug Ingredients Concepts (GitHub Gist)

Data was pulled in two waves. A total of 56,200 drug-ingredient concepts were eligible for mapping (51,941 drugs; 11,807 ingredients). The first waved returned all drug concepts ids used at least 1 time in practice (9,175 drugs; 1,697 ingredients). The second wave returned all standard drug concepts ids from RxNorm (42,766 drugs; 10,110 ingredients).

DATA TYPE	CONCEPT LEVEL	CODES	LABELS	SYNONYMS	VOCABULARIES
Concepts Used In Practice
Drugs	Concept	9,175	9,154	19,496	RxNorm
Drugs	Ancestor	140,937	77,135	N/A	SPL Cohort ATC NDFRT RxNorm VA Class CVX
Ingredients	Concept	1,697	1,696	1,868	RxNorm SPL
Ingredients	Ancestor	1,697	1,696	N/A	RxNorm SPL
Standard RxNorm Concepts Not Used In Practice
Drugs	Concept	42,766	42,640	52,688	RxNorm
Drugs	Ancestor	68,343	64,212	N/A	SPL Cohort ATC NDFRT RxNorm VA Class CVX
Ingredients	Concept	10,110	10,110	11,235	RxNorm
Ingredients	Ancestor	10,578	10,578	N/A	RxNorm

Measurements

Wiki Page: Measurements

SQL Queries

CHCO Measurements Used in Practice (GitHub Gist)
Standard LOINC2HPO Concepts Not Used In Practice (GitHub Gist)

Data was pulled in two waves. The first wave of data was pulled from CHCO and like with the condition and measurement domains, contains only those concepts that were used at least once in clinical practice. This set contained a total of 1,606 LOINC concepts or 4,425 lab test results (more information on how lab test results were identified below). The initial set of CHCO data were supplemented by adding the latest LOINC2HPO annotations. The current annotation set (annotations.tsv; last updated 06/07/2020) was downloaded from the develop branch of the LOINC2HPO GitHub repository on 08/12/2020). Of the 3,119 unique codes obtained from LOINC2HPO (7,421 unique results), 631 overlapped with the OMOP measurement terms retrieved from CHCO and were excluded. An additional 11 concepts were excluded due to being deprecated. This final set of terms was further processed to remove terms with duplicate result types (n=19 concepts). The final set of processed terms included 2,477 unique LOINC concepts or 6,844 lab test results.

Identifying LOINC Scale and Result Type
All lab test scale types (i.e. ordinal, nominal, quantitative, qualitative, narrative, doc, and panel) were initially eligible to be mapped. The scale type of each lab test was identified by parsing the free-text text in the concept synonym field for the presence of any of the scale types listed above. Result type was identified using a two-step approach. First, we analyzed the reference ranges available in the patient data. If at least one numeric result was reported the result type was recorded as Normal/Low/High and if a positive or negative result was reported it was recorded as Positive/Negative. Then, for all lab tests without a reference range in the data, the result type was obtained by parsing the free-text in the concept synonym field. For all tests with an ordinal scale type, if the keywords presence or screen were identified, the result type was reported as Positive/Negative. All tests with a quantitative scale type were given the result type Normal/Low/High. All other scale types were annotated with Unknown Result Type.

CONCEPT LEVEL	CODES	LABELS	SYNONYMS	VOCABULARIES
Concepts Used In Practice
Concept	1,606	1,606	41,981	LOINC PEDSnet
Ancestor	20,781	21,191	N/A	LOINC
LOINC2HPO Concepts Not Used In Practice
Concept	2,477	2,477	73,612	LOINC
Ancestor	23,457	24,306	N/A	LOINC

OMOP2OBO Mapping Sets

Required Mapping Data:

source_code_vocab_map.csv

Conditions

Drug Exposure Ingredients

Measurements

OMOP2OBO Mapping Validation

Accuracy
Validation work performed in order to demonstrate the accuracy of the OMOP2OBO mappings. This work was specifically designed to verify the accuracy of manually constructed mappings (i.e. mappings that were not created from automatic alignment of existing database cross-references or exact string mappings). A subset of the most difficult manual and manual constructor mappings were randomly selected and verified by members of the clinical team shown below. Please see the Accuracy Wiki for additional information.

Consistency
Validation work performed in order to demonstrate the logical consistency of the OMOP2OBO mappings. For additional information on how we creates semantic representations of the OMOP2OBO mappings see this wiki page. The experiment described below was designed to assess the semantic representation of the mappings. Please see the Consistency Wiki for additional information.

Generalizability
Validation work aimed at evaluating and characterizing the generalizability or coverage of the OMOP vocabulary terms included in the OMOP2OBO mapping set to OMOP vocabulary terms utilized in the Observational Health Data Sciences and Informatics (OHDSI) Concept Prevalence study sites. Please see the Generalizability Wiki for additional information.

Return to Top

This project is licensed under MIT - see the LICENSE.md file for details. If you intend to use any of the information on this Wiki, please provide the appropriate attribution by citing this repository:

@misc{callahan_tj_2020_4247939,
  author       = {Callahan, TJ},
  title        = {OMOP2OBO},
  month        = jun,
  year         = 2021,
  doi          = {10.5281/zenodo.4247939},
  url          = {https://doi.org/10.5281/zenodo.4247939}
}

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

V1.0

Release: v1.0 (first official release)

Release Updates

Jupyter Notebooks

Ontologies

Ontology Metadata

Clinical Data

Conditions

Drug Exposure Ingredients

Measurements

OMOP2OBO Mapping Sets

Conditions

Drug Exposure Ingredients

Measurements

OMOP2OBO Mapping Validation

Project Information

Releases

Current Release

Mapping Information

Clinical Data

Knowledge Representation

Validation

Enabling Reproducible Research

Clone this wiki locally