-
Notifications
You must be signed in to change notification settings - Fork 12
V1.0
New Data Sources:
- The original set of ontologies has been extended, see the Ontologies section for more information
- The National of Library Medicine's Unified Medical Language System (UMLS) MRCONSO and MRSTY. Using these data requires a NLM UMLS license agreement
Featured Functionality:
- To improve our mapping pipeline, we have created a Python-based version of Juan Banda's OHDSI Ananke
Release Data:
- All data used for this release can e downloaded directly from Zenodo (here)
Downloaded Resource Information:
The specific ontologies used in this release of OMOP2OBO
, including class and axiom counts, are shown in the table below. All ontologies were downloaded and processed on 09/14/20
.
Ontology | Classes | Definitions | Labels | Synonyms | DbXRefs |
---|---|---|---|---|---|
Cell Line Ontology (CL) | 2,238 |
1,859 |
2,238 |
2,124 |
1,376 |
Chemical Entities of Biological Interest (CHEBI) | 126,169 |
48,824 |
126,169 |
269,798 |
231,247 |
Human Phenotype Ontology (HPO) | 15,247 |
12,468 |
15,247 |
19,860 |
19,569 |
Mondo Disease Ontology (MONDO) | 22,288 |
15,271 |
22,288 |
98,181 |
159,918 |
NCBITaxon Organism Taxonomy (NCBITaxon) | 2,241,110 |
0 |
2,241,110 |
263,571 |
18,426 |
Protein Ontology (PRO) | 215,624 |
215,598 |
215,624 |
590,190 |
195,671 |
Uber-Anatomy Ontology (UBERON) | 13,898 |
11,026 |
13,898 |
36,771 |
51,322 |
Vaccine Ontology (VO) | 5,783 |
1,231 |
5,789 |
6 |
0 |
A Chi-Square test of independence was run to determine if the amount of metadata available differed by ontology. First, an omnibus test was run to determine whether there was a significant relationship between the metadata and ontologies. Results from this test (with Yate's correction) revealed a significant association between the ontology metadata and ontology type (X2(14)=2,664,853.817
, p<0.0001
). In order to better understand these findings, post-hoc tests were run using a Bonferroni adjustment to correct for multiple comparisons. These tests confirmed all ontologies had significantly different distributions of metadata (ps<0.0001
).
This section provides an overview of the clinical data available for mapping. To create the mappings, clinical data was pulled in two waves from an OMOP (v5.0
) PEDSNet (v3.0
)-normalized instance of Children's Hospital of Colorado data (#15-0445
).
Wiki Page: Conditions
SQL Queries
- Condition Concepts Used in Practice (
GitHub Gist
) - Standard SNOMED-CT Condition Concepts (
GitHub Gist
)
Data was pulled in two waves. The first waved returned all condition concepts ids used at least 1 time in practice (n=29,129
). The second wave returned all standard SNOMED-CT concept ids not used in practice (n=109,719
). Once the 29,129
concepts used in practice were removed, there 80,590
were standard SNOMED-CT concepts that had not been used in practice.
CONCEPT LEVEL | CODES | LABELS | SYNONYMS | VOCABULARIES |
---|---|---|---|---|
Concepts Used In Practice | ||||
Concept | 29,129 | 29,129 | 86,630 | SNOMED-CT |
Ancestor | 1,421,104 | 1,389,525 | N/A | SNOMED-CT Cohort MedDRA |
Standard SNOMED-CT Concepts Not Used In Practice | ||||
Concept | 80,590 | 80,590 | 194,264 | SNOMED-CT |
Ancestor | 3,458,072 | 3,393,343 | N/A | SNOMED-CT Cohort MedDRA |
Wiki Page: Drug Exposure Ingredients
SQL Queries
- Drug Exposure Ingredients Used in Practice (
GitHub Gist
) - Standard RxNorm Drug Ingredients Concepts (
GitHub Gist
)
Data was pulled in two waves. A total of 56,200
drug-ingredient concepts were eligible for mapping (51,941
drugs; 11,807
ingredients). The first waved returned all drug concepts ids used at least 1 time in practice (9,175
drugs; 1,697
ingredients). The second wave returned all standard drug concepts ids from RxNorm (42,766
drugs; 10,110
ingredients).
DATA TYPE | CONCEPT LEVEL | CODES | LABELS | SYNONYMS | VOCABULARIES |
---|---|---|---|---|---|
Concepts Used In Practice | |||||
Drugs | Concept | 9,175 | 9,154 | 19,496 | RxNorm |
Ancestor | 140,937 | 77,135 | N/A | SPL Cohort ATC NDFRT RxNorm VA Class CVX |
|
Ingredients | Concept | 1,697 | 1,696 | 1,868 | RxNorm SPL |
Ancestor | 1,697 | 1,696 | N/A | RxNorm SPL |
|
Standard RxNorm Concepts Not Used In Practice | |||||
Drugs | Concept | 42,766 | 42,640 | 52,688 | RxNorm |
Ancestor | 68,343 | 64,212 | N/A | SPL Cohort ATC NDFRT RxNorm VA Class CVX |
|
Ingredients | Concept | 10,110 | 10,110 | 11,235 | RxNorm |
Ancestor | 10,578 | 10,578 | N/A | RxNorm |
Wiki Page: Measurements
SQL Queries
- CHCO Measurements Used in Practice (
GitHub Gist
) - Standard LOINC2HPO Concepts Not Used In Practice (
GitHub Gist
)
Data was pulled in two waves. The first wave of data was pulled from CHCO
and like with the condition and measurement domains, contains only those concepts that were used at least once in clinical practice. This set contained a total of 1,606
LOINC concepts or 4,425
lab test results (more information on how lab test results were identified below). The initial set of CHCO
data were supplemented by adding the latest LOINC2HPO
annotations. The current annotation set (annotations.tsv; last updated 06/07/2020
) was downloaded from the develop branch of the LOINC2HPO GitHub repository on 08/12/2020
). Of the 3,119
unique codes obtained from LOINC2HPO
(7,421
unique results), 631
overlapped with the OMOP
measurement terms retrieved from CHCO
and were excluded. An additional 11
concepts were excluded due to being deprecated. This final set of terms was further processed to remove terms with duplicate result types (n=19
concepts). The final set of processed terms included 2,477
unique LOINC concepts or 6,844
lab test results.
Identifying LOINC Scale and Result Type
All lab test scale types (i.e. ordinal, nominal, quantitative, qualitative, narrative, doc, and panel) were initially eligible to be mapped. The scale type of each lab test was identified by parsing the free-text text in the concept synonym field for the presence of any of the scale types listed above. Result type was identified using a two-step approach. First, we analyzed the reference ranges available in the patient data. If at least one numeric result was reported the result type was recorded as Normal/Low/High
and if a positive
or negative
result was reported it was recorded as Positive/Negative
. Then, for all lab tests without a reference range in the data, the result type was obtained by parsing the free-text in the concept synonym field. For all tests with an ordinal scale type, if the keywords presence
or screen
were identified, the result type was reported as Positive/Negative
. All tests with a quantitative scale type were given the result type Normal/Low/High
. All other scale types were annotated with Unknown Result Type
.
CONCEPT LEVEL | CODES | LABELS | SYNONYMS | VOCABULARIES |
---|---|---|---|---|
Concepts Used In Practice | ||||
Concept | 1,606 | 1,606 | 41,981 | LOINC PEDSnet |
Ancestor | 20,781 | 21,191 | N/A | LOINC |
LOINC2HPO Concepts Not Used In Practice | ||||
Concept | 2,477 | 2,477 | 73,612 | LOINC |
Ancestor | 23,457 | 24,306 | N/A | LOINC |
Required Mapping Data:
Accuracy
Validation work performed in order to demonstrate the accuracy of the OMOP2OBO
mappings. This work was specifically designed to verify the accuracy of manually
constructed mappings (i.e. mappings that were not created from automatic alignment of existing database cross-references or exact string mappings). A subset of the most difficult manual
and manual constructor
mappings were randomly selected and verified by members of the clinical team shown below. Please see the Accuracy Wiki for additional information.
Consistency
Validation work performed in order to demonstrate the logical consistency of the OMOP2OBO mappings. For additional information on how we creates semantic representations of the OMOP2OBO mappings see this wiki page. The experiment described below was designed to assess the semantic representation of the mappings. Please see the Consistency Wiki for additional information.
Generalizability
Validation work aimed at evaluating and characterizing the generalizability or coverage of the OMOP vocabulary terms included in the OMOP2OBO
mapping set to OMOP vocabulary terms utilized in the Observational Health Data Sciences and Informatics (OHDSI) Concept Prevalence study sites. Please see the Generalizability Wiki for additional information.
This project is licensed under MIT - see the LICENSE.md
file for details. If you intend to use any of the information on this Wiki, please provide the appropriate attribution by citing this repository:
@misc{callahan_tj_2020_4247939,
author = {Callahan, TJ},
title = {OMOP2OBO},
month = jun,
year = 2021,
doi = {10.5281/zenodo.4247939},
url = {https://doi.org/10.5281/zenodo.4247939}
}