phenopy

phenopy is a Python package to perform phenotype similarity scoring by semantic similarity. phenopy is a lightweight but highly optimized command line tool and library to efficiently perform semantic similarity scoring on generic entities with phenotype annotations from the Human Phenotype Ontology (HPO).

Installation

GitHub

Install from GitHub:

git clone https://github.com/GeneDx/phenopy.git
cd phenopy
python setup.py install

Command Line Usage

Initial setup

phenopy is designed to run with minimal setup from the user, to run phenopy with default parameters (recommended), skip ahead to the Commands overview.

This section provides details about where phenopy stores data resources and config files. The following occurs when you run phenopy for the first time.

phenopy creates a .phenopy/ directory in your home folder and downloads external resources from HPO into the $HOME/.phenopy/data/ directory.
phenopy stores a binary version of the HPO as a networkx graph object here: $HOME/.phenopy/data/hpo_network.pickle.
phenopy creates a $HOME/.phenopy/phenopy.ini config file where users can set variables for phenopy to use at runtime.

Commands overview

phenopy is primarily used as a command line tool. An entity, as described here, is presented as a sample, gene, or disease, but could be any concept that warrants annotation of phenotype terms.

Score similarity of an entity defined by the HPO terms from an input file against all the genes in .phenopy/data/phenotype_to_genes.txt. We provide a test input file in the repo.
```
phenopy score tests/data/test.score.txt
```
Output:
```
#query	gene	score
SAMPLE	NCBI:10000[AKT3]	0.0252
SAMPLE	NCBI:10002[NR2E3]	0.0148
SAMPLE	NCBI:100033413[SNORD116-1]	0.0283
...
```

Score similarity of an entity defined by the HPO terms from an input file against a custom list of entities with HPO annotations, referred to as the --records-file.

phenopy score tests/data/test.score.txt --records-file tests/data/test.score-product.txt

Output:

#query	entity_id	score
SAMPLE	118200	0.0584
SAMPLE	118210	0.057
SAMPLE	118220	0.0563
...

Score pairwise similarity of entities defined in the --records-file.

phenopy score-product tests/data/test.score-product.txt --threads 4

Output:

118200	118200	0.7692
118200	118300	0.5345
118200	300905	0.2647
...

Parameters

For a full list of command arguments use phenopy [subcommand] --help:

phenopy score --help

Output:

    --records_file=RECORDS_FILE
        One record per line, tab delimited. First column record unique identifier, second column pipe separated list of HPO identifier (HP:0000001).
    --query_name=QUERY_NAME
        Unique identifier for the query file.
    --obo_file=OBO_FILE
        OBO file from https://hpo.jax.org/app/download/ontology.
    --pheno2genes_file=PHENO2GENES_FILE
        Phenotypes to genes from https://hpo.jax.org/app/download/annotation.
    --threads=THREADS
        Number of parallel process to use.
    --agg_score=AGG_SCORE
        The aggregation method to use for summarizing the similarity matrix between two term sets Must be one of {'BMA', 'maximum'}
    --no_parents=NO_PARENTS
        If provided, scoring is done by only using the most informative nodes. All parent nodes are removed.
    --hpo_network_file=HPO_NETWORK_FILE
        If provided, phenopy will try to load a cached hpo_network obejct from file.
    --custom_annotations_file=CUSTOM_ANNOTATIONS_FILE
        A comma-separated list of custom annotation files in the same format as tests/data/test.score-product.txt
    --output_file=OUTPUT_FILE
        filepath where to store the results.

Library Usage

The phenopy library can be used as a Python module, allowing more control for advanced users.

import os
from phenopy import config
from phenopy.obo import restore
from phenopy.score import Scorer

network_file = os.path.join(config.data_directory, 'hpo_network.pickle')

hpo = restore(network_file)
scorer = Scorer(hpo)

terms_a = ['HP:0001882', 'HP:0011839']
terms_b = ['HP:0001263', 'HP:0000252']

print(scorer.score(terms_a, terms_b))

Output:

0.0005

Another example is to use the library to prune parent phenotypes from the phenotype_to_genes.txt

import os
from phenopy import config
from phenopy.obo import restore
from phenopy.util import export_pheno2genes_with_no_parents


network_file = os.path.join(config.data_directory, 'hpo_network.pickle')
phenotype_to_genes_file = os.path.join(config.data_directory, 'phenotype_to_genes.txt')
phenotype_to_genes_no_parents_file = os.path.join(config.data_directory, 'phenotype_to_genes_no_parents.txt')

hpo = restore(network_file)
export_pheno2genes_with_no_parents(phenotype_to_genes_file, phenotype_to_genes_no_parents_file, hpo)

Config

While we recommend using the default settings for most users, the config file can be modified: $HOME/.phenopy/phenopy.ini.

IMPORTANT NOTE:
If the config variable hpo_network_file is defined, phenopy will try to load this stored version of the HPO and ignore the following command-line arguments: obo_file and custom_annotations_file.

To run phenopy with different obo_file or custom_annotations_file: Rename or move the HPO network file: mv $HOME/.phenopy/data/hpo_network.pickle $HOME/.phenopy/data/hpo_network.old.pickle

To run phenopy with a previously stored version of the HPO network, simply set hpo_network_file = /path/to/hpo_network.pickle.

Contributing

We welcome contributions from the community. Please follow these steps to setup a local development environment.

pipenv install --dev

To run tests locally:

pipenv shell
coverage run --source=. -m unittest discover --start-directory tests/
coverage report -m

References

The underlying algorithm which determines the semantic similarity for any two HPO terms is based on an implementation of HRSS, published here.

Name		Name	Last commit message	Last commit date
Latest commit History 141 Commits
notebooks		notebooks
phenopy		phenopy
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

phenopy

Installation

GitHub

Command Line Usage

Initial setup

Commands overview

Parameters

Library Usage

Config

Contributing

References

About

Releases

Packages

Languages

License

CarlosBorroto/phenopy

Folders and files

Latest commit

History

Repository files navigation

phenopy

Installation

GitHub

Command Line Usage

Initial setup

Commands overview

Parameters

Library Usage

Config

Contributing

References

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages