Skip to content

Laboratory Tests

Tiffany J. Callahan edited this page Jul 3, 2020 · 26 revisions

Collaborators


Documentation


Table of Contents



Background

The goal of this project was to map measurement results drawn from the Observational Medical Outcomes Partnership (OMOP) common data model to the Open Biomedical Ontologies (OBO). Specifically, we aimed to annotate all unique test results LOINC codes assigned to at least 1 patient (n=902 codes; 2,706 test results) to an OBO ontology.

“The Open Biological and Biomedical Ontology (OBO) Foundry is a collective of ontology developers that are committed to collaboration and adherence to shared principles. The mission of the OBO Foundry is to develop a family of interoperable ontologies that are both logically well-formed and scientifically accurate.” -OBO Foundry

Currently, there are very few annotations (i.e. mappings or connecting of similar concepts from different sources) that exist between clinical terminologies and the OBO ontologies. Creating these mappings enables transition into a reproducible research framework where clinical observations can be viewed within the context of their underlying molecular mechanism(s).

This task will use the Human Phenotype Ontology (HPO), the uber-anatomy (UBERON)/Chemical Entities of Biological Interest (ChEBI), :

Human Phenotype Ontology

The Human Phenotype Ontology (HPO) provides a standardized vocabulary of phenotypic abnormalities encountered in human disease. Each term in the HPO describes a phenotypic abnormality, such as Atrial septal defect. - HPO

The Uber-Anatomy Ontology

The Uber-Anatomy Ontology (UBERON) represents anatomy (i.e. body parts, organs and tissues) for multiple species. - UBERON

Chemical Entities of Biological Interest

The Chemical Entities of Biological Interest (ChEBI) represents molecular entities, specifically, small chemical compounds. - ChEBI

National Center for Biotechnology Information Taxonomy

The National Center for Biotechnology Information Taxonomy (NCBITaxon) ontology is an automatic translation of the NCBI taxonomy database into obo/owl. - NCBITaxon

Protein Ontology

The Protein Ontology (PRO) provides an ontological representation of protein-related entities by explicitly defining them and showing the relationships between them. Each PRO term represents a distinct class of entities (including specific modified forms, orthologous isoforms, and protein complexes) ranging from the taxon-neutral to the taxon-specific (e.g. the entity representing all protein products of the human SMAD2 gene is described in PR:Q15796; one particular human SMAD2 protein form, phosphorylated on the last two serines of a conserved C-terminal SSxS motif is defined by PR:000025934). - PRO

Cell Ontology

The Cell Ontology (CL) is designed as a structured controlled vocabulary for cell types. This ontology was constructed for use by the model organism and other bioinformatics databases, where there is a need for a controlled vocabulary of cell types. This ontology is not organism specific it covers cell types from prokaryotes to mammals. However, it excludes plant cell types, which are covered by PO. - CL



Concept Annotation

Our goal, was to map all unique LOINC laboratory test results (i.e. low, normal, or high), assigned to at least one pediatric patient, to HPO. For laboratory tests, each result is considered independently in order to find the best possible mapping to an ontology concept.

Examples

LOINC Result HPO
LOINC_28606-2 : 1-Methylhistidine/Creatinine [Ratio] in Urine Low Decreased urinary 1-methylhistidine (HP_0410314)
LOINC_28606-2 : 1-Methylhistidine/Creatinine [Ratio] in Urine Normal NOT(Abnormal urinary 1-methylhistidine concentration) (HP_0410313)
LOINC_28606-2 : 1-Methylhistidine/Creatinine [Ratio] in Urine High Increased urinary 1-methylhistine (HP_0410315)

The following tasks were performed to map LOINC laboratory test results to the HPO:

Export LOINC Laboratory test Results

  • Export each LOINC id and it’s ancestors from a pediatric (CHCO) instance of the OMOP common data model (data exported October, 2018). The SQL code that was used to retrieve these codes is stored as a GitHub Gist and can be found here. For convenience, the queries are also shown below:
WITH 
  measurement_concepts
  AS (SELECT
        m.measurement_concept_id AS CONCEPT_ID,
        c.concept_code AS CONCEPT_SOURCE_CODE, 
        c.concept_name AS CONCEPT_LABEL,
        c.vocabulary_id AS CONCEPT_VOCAB,
        v.vocabulary_version AS CONCEPT_VOCAB_VERSION
      FROM 
        CHCO_DeID_Oct2018.measurement m 
        JOIN CHCO_DeID_Oct2018.concept c ON m.measurement_concept_id = c.concept_id
        JOIN CHCO_DeID_Oct2018.vocabulary v ON c.vocabulary_id = v.vocabulary_id
      WHERE 
        c.concept_name != "No matching concept" 
        AND c.domain_id = "Measurement"
      GROUP BY CONCEPT_ID, CONCEPT_SOURCE_CODE, CONCEPT_LABEL, CONCEPT_VOCAB, CONCEPT_VOCAB_VERSION),
  
  measurement_ancestors
  AS (SELECT
        ca.descendant_concept_id AS CONCEPT_ID,
        STRING_AGG(DISTINCT(CAST(c1.concept_id as STRING)), " | ") AS ANCESTOR_CONCEPT_ID,
        STRING_AGG(DISTINCT(c1.concept_code), " | ") AS ANCESTOR_SOURCE_CODE, 
        STRING_AGG(DISTINCT(c1.concept_name), " | ") AS ANCESTOR_LABEL,
        STRING_AGG(DISTINCT(c1.vocabulary_id), " | ") AS ANCESTOR_VOCAB,
        STRING_AGG(DISTINCT(v.vocabulary_version), " | ") AS ANCESTOR_VOCAB_VERSION
      FROM 
        CHCO_DeID_Oct2018.concept_ancestor ca
        JOIN CHCO_DeID_Oct2018.concept c1 ON ca.ancestor_concept_id = c1.concept_id
        JOIN CHCO_DeID_Oct2018.vocabulary v ON c1.vocabulary_id = v.vocabulary_id
      WHERE 
        ca.descendant_concept_id IN (SELECT CONCEPT_ID FROM measurement_concepts)
        AND c1.concept_name != "No matching concept"
        AND c1.concept_id IS NOT NULL
        AND c1.domain_id = "Measurement"
      GROUP BY CONCEPT_ID),
  
  measurement_results
  AS (SELECT 
        measurement_concept_id AS CONCEPT_ID,
        CASE WHEN REGEXP_CONTAINS(STRING_AGG(range_low_source_value, ""), r'(?i)(positive|negative)') IS TRUE THEN "Negative/Positive" 
             WHEN REGEXP_CONTAINS(STRING_AGG(range_high_source_value, ""), r'(?i)(positive|negative)') IS TRUE THEN "Negative/Positive"         
             WHEN REGEXP_CONTAINS(STRING_AGG(range_low_source_value, ""), r'[[:digit:]]') IS TRUE THEN "Normal/Low/High"
             WHEN REGEXP_CONTAINS(STRING_AGG(range_high_source_value, ""), r'[[:digit:]]') IS TRUE THEN "Normal/Low/High"
             ELSE NULL END AS RESULT_TYPE
      FROM CHCO_DeID_Oct2018.measurement
      WHERE measurement_concept_id in (SELECT CONCEPT_ID FROM measurement_concepts)
      GROUP BY CONCEPT_ID),
  
  measurement_scale
  AS (SELECT 
        s.concept_id AS CONCEPT_ID,
        STRING_AGG(DISTINCT(s.concept_synonym_name), " | ") AS CONCEPT_SYNONYM,
        STRING_AGG(s.concept_synonym_name, ""),
        CASE WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)ordinal') IS TRUE THEN "ORD"
             WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)nominal') IS TRUE THEN "NOM"
             WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)quantitative') IS TRUE THEN "QUANT"
             WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)qualitative') IS TRUE THEN "QUAL"
             WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)narrative') IS TRUE THEN "NAR"
             WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)doc') IS TRUE THEN "DOC"
             WHEN REGEXP_CONTAINS(STRING_AGG(s.concept_synonym_name, ""), r'(?i)(panel|pnl|panl)') IS TRUE THEN "PNL"
             ELSE "Unmapped Scale Type" END AS SCALE
        FROM CHCO_DeID_Oct2018.concept_synonym s 
        WHERE s.concept_id in (SELECT CONCEPT_ID FROM measurement_concepts)
        GROUP BY CONCEPT_ID),
  
  measurement_metadata_update
  AS (SELECT
        r.CONCEPT_ID,
        CASE WHEN (r.RESULT_TYPE IS NULL AND s.SCALE = "ORD") AND REGEXP_CONTAINS(s.CONCEPT_SYNONYM, r'(?i)screen') IS TRUE THEN "Negative/Positive"
             WHEN (r.RESULT_TYPE IS NULL AND s.SCALE = "ORD") AND REGEXP_CONTAINS(s.CONCEPT_SYNONYM, r'(?i)presence') IS TRUE THEN "Negative/Positive"
             WHEN r.RESULT_TYPE IS NULL AND s.SCALE = "QUANT" THEN "Normal/Low/High"
             WHEN r.RESULT_TYPE IS NOT NULL THEN r.RESULT_TYPE
             ELSE "Unknown Result Type" END AS RESULT_TYPE,
        CASE WHEN s.SCALE IS NULL THEN "Other"  # for non-LOINC scale types
             ELSE s.SCALE END AS SCALE
        FROM
          (SELECT * FROM measurement_results) r
          FULL JOIN (SELECT * FROM measurement_scale) s ON r.CONCEPT_ID = s.CONCEPT_ID)

SELECT
  m.CONCEPT_ID,
  m.CONCEPT_SOURCE_CODE,
  m.CONCEPT_LABEL,
  m.CONCEPT_VOCAB,
  m.CONCEPT_VOCAB_VERSION,
  s.CONCEPT_SYNONYM,
  a.ANCESTOR_CONCEPT_ID,
  a.ANCESTOR_SOURCE_CODE, 
  a.ANCESTOR_LABEL,
  a.ANCESTOR_VOCAB,
  a.ANCESTOR_VOCAB_VERSION,
  u.SCALE,
  u.RESULT_TYPE
  
FROM measurement_concepts m
  FULL JOIN measurement_ancestors a ON m.CONCEPT_ID = a.CONCEPT_ID
  FULL JOIN measurement_scale s ON m.CONCEPT_ID = s.CONCEPT_ID
  FULL JOIN measurement_metadata_update u ON m.CONCEPT_ID = u.CONCEPT_ID;


Annotation Verification

Two verification approaches were applied, the first was survey-based and the second involved manual mapping verification by a professional biocurator.

Annotation Survey

A subset (n=270) of pediatric-specific laboratory test result mappings were independently validated by five domain experts (i.e. three pediatric clinicians, a PhD-level molecular biologist, and a master’s-level epidemiologist). The study was approved by the Colorado Multiple Institutional Review Board (15-0445).

To perform this validation, a Qualtrics survey (see QR code) was designed so that each question featured a laboratory test description and set of reasonable HPO concepts.

The survey was completed by all experts between October and December (2018). After completion, any laboratory test mapping that did not meet agreement by at least one clinician and both the biologist/epidemiologist were re-evaluated with one clinician until consensus was reached (n=58 lab results). These terms were additionally vetted on the loinc2hpoAnnotation GitHub tracker by the entire team of HPO biocurators.

Results. Agreement on mapping was 95.9% between the clinicians, 79.3% between the epidemiologist and biologist, and 90.7% between the clinicians and the biologist and epidemiologist. The best mapping across all experts, was 92% in agreement with existing LOINC2HPO mappings.

Biocurator Verification

The subset of 691 randomly selected LOINC codes were verified by a professional biocurator. A screenshot of the verification table is shown below. Additional information on this mapping process, including the new terms we requested in order to complete this mapping, can be found in the Human Phenotype Ontology GitHub tracker.

Resources

To verify or search the ontologies for alternative terms, the biocurator was asked to use the following resource:

Verification Instructions

  • Verify each of the mappings, row-by-row considering each LOINC lab code result within the context of the ontologiy mappings that have been provided.
  • The goal is to find the best mapping between a single ontology term and a LOINC laboratory test result.


Annotation Results

Time line

Mapping (10/2018-11-2019); Clinician verification (survey) (10/2019-12/2018); Biocurator verification (01/2019-03/2019); Mapping finalized (10/13/2019). Results update (05/08/20)

Mapping Categories

We completed the mapping of 902 unique measurements and 2,706 unique measurement results.

Mapping Type Count
Manually Mapped 2616
UnMapped 90

Results Update: There were 1,605 unique tests and 4,815 tests results that could be mapped.

Manually Mapped Manually Mapped - Constructor UnMapped - None UnMapped - Not Mapped Test Type UnMapped - Unspecified Sample N/A
HPO 1380 4 54 93 74
UBERON 946 411 54 93 74
CL 157 2 54 93 74
CHEBI 673 8 54 93 74
NCBITaxon 279 1 54 93 74
PRO 180 2 54 93 74
Clone this wiki locally