Skip to content

Latest commit

 

History

History
207 lines (143 loc) · 7.54 KB

README.md

File metadata and controls

207 lines (143 loc) · 7.54 KB

HPO

HPO, the Human Phenotype Ontology is a standard vocabulary of phenotypic abnormalities in human diseases. It is an Ontology, so all terms are connected to each other, similar to a directed graph.

This library provides convenient APIs to work with the ontology. The main goals are to compare terms - or sets of terms - to each other and run statistics for enrichment analysis.

This library is basically a Rust implementation of PyHPO, but contains some additional features as well.

Features

  • 👫 Identify patient cohorts based on clinical features
  • 👨‍👧‍👦 Cluster patients or other clinical information for GWAS
  • 🩻→🧬 Phenotype to Genotype studies
  • 🍎🍊 HPO similarity analysis
  • 🕸️ Graph based analysis of phenotypes, genes and diseases
  • 🔬 Enrichment analysis of genes and diseases in sets of HPO terms
  • Completely written in Rust, so it's 🚀blazingly fast🚀TM (Benchmarks)

What is the current state?

The library is pretty much feature-complete, at least for my use-cases. If you have any feature-requests, please open an Issue or get in touch. I'm very much interested in getting feedback and new ideas what to improve.

The API is mostly stable, but I might refactor some parts a bit for easier use and performance gain.

If you find this project interesting and want to contribute, please get in touch, I could definitely need some help.

Documentation

The public API is fully documented on docs.rs

The main structs used in hpo are:

  • The Ontology is the main struct and entrypoint in hpo.
  • HpoTerm represents a single HPO term and contains plenty of functionality around them.
  • HpoSet is a collection of HpoTerms, like a patient's clinical information.
  • Gene represents a single gene, including information about associated HpoTerms.
  • OmimDisease represents a single OMIM-diseases, including information about associated HpoTerms.
  • OrphaDisease represents a single ORPHA-diseases, including information about associated HpoTerms.

The most relevant modules are:

  • annotations contains the Gene, OmimDisease and OrphaDisease structs, and some related important types.
  • similarity contains structs and helper functions for similarity comparisons for HpoTerm and HpoSet.
  • stats contains functions to calculate the hypergeometric enrichment score of genes or diseases.

Examples

Some (more or less random) examples are included in the examples folder.

Ontology

use hpo::{Ontology, HpoTermId};
use hpo::annotations::{GeneId, OmimDiseaseId, OrphaDiseaseId};

fn example() {
    let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();

    // iterate HPO terms
    for term in &ontology {
        // do something with term
    }

    // iterate Genes
    for gene in ontology.genes() {
        // do something with gene
    }

    // iterate omim diseases
    for disease in ontology.omim_diseases() {
        // do something with disease
    }

    // iterate orpha diseases
    for disease in ontology.orpha_diseases() {
        // do something with disease
    }

    // get a single HPO term using HPO ID
    let hpo_id = HpoTermId::try_from("HP:0000123").unwrap();
    let term = ontology.hpo(hpo_id);

    // get a single HPO term using `u32` part of HPO ID
    let term = ontology.hpo(123u32);

    // get a single Omim disease
    let disease_id = OmimDiseaseId::from(12345u32);
    let disease = ontology.omim_disease(&disease_id);

    // get a single Orpha disease
    let disease_id = OrphaDiseaseId::from(12345u32);
    let disease = ontology.orpha_disease(&disease_id);

    // get a single Gene
    let hgnc_id = GeneId::from(12345u32);
    let gene = ontology.gene(&hgnc_id);

    // get a single Gene by its symbol
    let gene = ontology.gene_by_name("GBA");

}

HPO term

use hpo::Ontology;

fn example() {
    let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();

    let term = ontology.hpo(123u32).unwrap();

    assert_eq!("Abnormality of the nervous system", term.name());
    assert_eq!("HP:000123".to_string(), term.id().to_string());

    // iterate all parents
    for p in term.parents() {
        println!("{}", p.name())
    }

    // iterate all children
    for p in term.children() {
        println!("{}", p.name())
    }

    let term2 = ontology.hpo(1u32).unwrap();

    assert!(term2.parent_of(&term));
    assert!(term.child_of(&term2));
}

Similarity

use hpo::Ontology;
use hpo::similarity::GraphIc;
use hpo::term::InformationContentKind;

fn example() {
    let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();
    let term1 = ontology.hpo(123u32).unwrap();
    let term2 = ontology.hpo(1u32).unwrap();

    let ic = GraphIc::new(InformationContentKind::Omim);
    let similarity = term1.similarity_score(&term2, &ic);
}

Enrichment

Identify which genes (or diseases) are enriched in a set of HpoTerms, e.g. in the clinical information of a patient or patient cohort

use hpo::Ontology;
use hpo::{HpoSet, term::HpoGroup};
use hpo::stats::hypergeom::gene_enrichment;

fn example() {
    let ontology = Ontology::from_binary("tests/ontology.hpo").unwrap();

    let mut hpos = HpoGroup::new();
    hpos.insert(2943u32);
    hpos.insert(8458u32);
    hpos.insert(100884u32);
    hpos.insert(2944u32);
    hpos.insert(2751u32);
    let patient_ci = HpoSet::new(&ontology, hpos);

    let mut enrichments = gene_enrichment(&ontology, &patient_ci);

    // the results are not sorted by default
    enrichments.sort_by(|a, b| {
        a.pvalue().partial_cmp(&b.pvalue()).unwrap()
    });

    for gene in enrichments {
        println!("{}\t{}\t({})", gene.id(), gene.pvalue(), gene.enrichment());
    }
}

Benchmarks

As the saying goes: "Make it work, make it good, make it fast". The work and good parts are realized in PyHPO. And even though I tried my best to make it fast, I was still hungry for more. So I started developing the hpo Rust library in December 2022. Even without micro-benchmarking and tuning performance as much as I did for PyHPO, hpo is indeed much much faster already now.

The below benchmarks were run non scientificially and your mileage may vary. I used a MacBook Air M1, rustc 1.68.0, Python 3.9 and /usr/bin/time for timing.

Benchmark PyHPO hpo (single-threaded) hpo (multi-threaded)
Read and Parse Ontology 6.4 s 0.22 s 0.22 s
Similarity of 17,245 x 1,000 terms 98.5 s 4.6 s 1.0 s
Similarity of GBA1 to all Diseases 380 s 15.8 s 3.0 s
Disease enrichment in all Genes 11.8 s 0.4 s 0.3 s
Common ancestors of 17,245 x 10,000 terms 225.2 s 10.5 2.1

Technical design

There is some info about the plans for the implementation in the Technical Design document