Skip to content

Commit

Permalink
Merge pull request #411 from monarch-initiative/rename-mtc-filter-to-…
Browse files Browse the repository at this point in the history
…if-hpo-filter

Rename `HpoMtcFilter` to `IfHpoFilter`
  • Loading branch information
ielis authored Jan 31, 2025
2 parents 7cec169 + a9346dd commit d44e718
Show file tree
Hide file tree
Showing 8 changed files with 151 additions and 74 deletions.
4 changes: 2 additions & 2 deletions docs/tutorial.rst
Original file line number Diff line number Diff line change
Expand Up @@ -314,7 +314,7 @@ For general use, we recommend using a combination
of a *phenotype MT filter* (:class:`~gpsea.analysis.mtc_filter.PhenotypeMtcFilter`) with a *multiple testing correction*.
Phenotype MT filter chooses the HPO terms to test according to several heuristics, which
reduce the multiple testing burden and focus the analysis
on the most interesting terms (see :ref:`HPO MT filter <hpo-mt-filter>` for more info).
on the most interesting terms (see :ref:`Independent filtering for HPO <hpo-if-filter>` for more info).
Then the multiple testing correction, such as Bonferroni or Benjamini-Hochberg,
is used to control the family-wise error rate or the false discovery rate.
See :ref:`mtc` for more information.
Expand All @@ -323,7 +323,7 @@ See :ref:`mtc` for more information.
>>> analysis = configure_hpo_term_analysis(hpo)

:func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` configures the analysis
that uses HPO MTC filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) for selecting HPO terms of interest,
that uses Independent filtering for HPO (:class:`~gpsea.analysis.mtc_filter.IfHpoFilter`) for selecting HPO terms of interest,
Fisher Exact test for computing nominal p values, and Benjamini-Hochberg for multiple testing correction.


Expand Down
36 changes: 21 additions & 15 deletions docs/user-guide/analyses/mtc.rst
Original file line number Diff line number Diff line change
Expand Up @@ -171,31 +171,37 @@ we pass an iterable (e.g. a tuple) with these two terms as an argument:
2


.. _hpo-mt-filter:
.. _hpo-if-filter:

HPO MT filter
-------------
Independent filtering for HPO
-----------------------------

Independent filtering for HPO involves making several domain judgments
and taking advantage of the HPO structure
in order to reduce the number of HPO terms for testing.
The filter's logic is made up of 8 individual heuristics
to skip testing the terms that are unlikely to yield significant or interesting results (see below).

The HPO MT filter involves making several domain judgments and takes advantage of the HPO structure.
The strategy needs access to HPO:
Some of the heuristics need to access HPO hierarchy,
so let's load HPO

>>> import hpotk
>>> store = hpotk.configure_ontology_store()
>>> hpo = store.load_minimal_hpo(release='v2024-07-01')

and it is implemented in the :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter` class:
and let's create the :class:`~gpsea.analysis.mtc_filter.IfHpoFilter` class
using the static constructor
:func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter`:

>>> from gpsea.analysis.mtc_filter import IfHpoFilter
>>> hpo_mtc = IfHpoFilter.default_filter(hpo=hpo)

>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
>>> hpo_mtc = HpoMtcFilter.default_filter(hpo=hpo)

The constructor takes HPO and two thresholds (optional).
See the API documentation and the explanations below for more details.

We use static constructor :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default_filter`
for creating :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
The constructor takes a ``term_frequency_threshold`` option (40% by default)
and the method's logic is made up of 8 individual heuristics
designed to skip testing the HPO terms that are unlikely to yield significant or interesting results.

.. contents:: HPO MT filters
.. contents:: Independent filtering for HPO
:depth: 1
:local:

Expand Down Expand Up @@ -296,6 +302,6 @@ and we have explicit observed observations for 20 and excluded for 10 individual
then the annotation frequency is `0.3`.

The threshold is set as ``annotation_frequency_threshold`` option
of the :func:`~gpsea.analysis.mtc_filter.HpoMtcFilter.default_filter` constructor,
of the :func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter` constructor,
with the default value of `0.4` (40%).

12 changes: 6 additions & 6 deletions docs/user-guide/analyses/phenotype-classes.rst
Original file line number Diff line number Diff line change
Expand Up @@ -207,7 +207,7 @@ a phenotype multiple testing (MT) filter and multiple testing correction (MTC).

Phenotype MT filter selects a (sub)set of HPO terms for testing,
for instance only the user-selected terms (see :class:`~gpsea.analysis.mtc_filter.SpecifiedTermsMtcFilter`)
or the terms selected by :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`.
or the terms selected by :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`.

MTC then adjusts the nominal p values for the increased risk
of false positive G/P associations.
Expand All @@ -221,8 +221,8 @@ We must choose a phenotype MT filter as well as a MTC procedure to perform genot
Default analysis
^^^^^^^^^^^^^^^^

We recommend using HPO MT filter (:class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`) as a phenotype MT filter
and Benjamini-Hochberg for MTC.
We recommend using Independent filtering for HPO (:class:`~gpsea.analysis.mtc_filter.IfHpoFilter`)
and Benjamini-Hochberg MT correction.
The default analysis can be configured with :func:`~gpsea.analysis.pcats.configure_hpo_term_analysis` convenience method.

>>> from gpsea.analysis.pcats import configure_hpo_term_analysis
Expand All @@ -240,10 +240,10 @@ Custom analysis
If the default selection of phenotype MT filter and multiple testing correction is not an option,
we can configure the analysis manually.

First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`):
First, we choose a phenotype MT filter (e.g. :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`):

>>> from gpsea.analysis.mtc_filter import HpoMtcFilter
>>> mtc_filter = HpoMtcFilter.default_filter(hpo, term_frequency_threshold=.2)
>>> from gpsea.analysis.mtc_filter import IfHpoFilter
>>> mtc_filter = IfHpoFilter.default_filter(hpo, term_frequency_threshold=.2)

.. note::

Expand Down
12 changes: 9 additions & 3 deletions src/gpsea/analysis/mtc_filter/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,9 +6,15 @@
"""

from ._impl import PhenotypeMtcFilter, PhenotypeMtcResult, PhenotypeMtcIssue
from ._impl import UseAllTermsMtcFilter, SpecifiedTermsMtcFilter, HpoMtcFilter
from ._impl import UseAllTermsMtcFilter, SpecifiedTermsMtcFilter, IfHpoFilter
from ._impl import HpoMtcFilter

__all__ = [
'PhenotypeMtcFilter', 'PhenotypeMtcResult', 'PhenotypeMtcIssue',
'UseAllTermsMtcFilter', 'SpecifiedTermsMtcFilter', 'HpoMtcFilter',
"PhenotypeMtcFilter",
"PhenotypeMtcResult",
"PhenotypeMtcIssue",
"UseAllTermsMtcFilter",
"SpecifiedTermsMtcFilter",
"IfHpoFilter",
"HpoMtcFilter",
]
115 changes: 90 additions & 25 deletions src/gpsea/analysis/mtc_filter/_impl.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,6 +3,7 @@
import typing

from collections import deque
import warnings

import hpotk
import pandas as pd
Expand Down Expand Up @@ -252,14 +253,14 @@ def verify_term_id(val: typing.Union[str, hpotk.TermId]) -> hpotk.TermId:
raise ValueError(f"{val} is neither `str` nor `hpotk.TermId`")


class HpoMtcFilter(PhenotypeMtcFilter[hpotk.TermId]):
class IfHpoFilter(PhenotypeMtcFilter[hpotk.TermId]):
"""
`HpoMtcFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.
`IfHpoFilter` decides which phenotypes should be tested and which phenotypes are not worth testing.
The class leverages a number of heuristics and domain decisions.
See :ref:`hpo-mt-filter` section for more info.
See :ref:`hpo-if-filter` section for more info.
We recommend creating an instance using the :func:`default_filter` static factory method.
We recommend creating an instance using the :func:`~gpsea.analysis.mtc_filter.IfHpoFilter.default_filter` static factory method.
"""

NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO = PhenotypeMtcResult.fail(
Expand Down Expand Up @@ -340,7 +341,7 @@ def default_filter(
general_hpo_term_set.update(second_level_terms)
general_hpo_term_set.update(third_level_terms)

return HpoMtcFilter(
return IfHpoFilter(
hpo=hpo,
term_frequency_threshold=term_frequency_threshold,
annotation_frequency_threshold=annotation_frequency_threshold,
Expand All @@ -355,13 +356,15 @@ def __init__(
general_hpo_terms: typing.Iterable[hpotk.TermId],
):
self._hpo = hpo
assert isinstance(term_frequency_threshold, (int, float)) \
and 0. < term_frequency_threshold <= 1., \
"The term_frequency_threshold must be in the range (0, 1]"
assert (
isinstance(term_frequency_threshold, (int, float))
and 0.0 < term_frequency_threshold <= 1.0
), "The term_frequency_threshold must be in the range (0, 1]"
self._hpo_term_frequency_filter = term_frequency_threshold
assert isinstance(annotation_frequency_threshold, (int, float)) \
and 0. < annotation_frequency_threshold <= 1., \
"The annotation_frequency_threshold must be in the range (0, 1]"
assert (
isinstance(annotation_frequency_threshold, (int, float))
and 0.0 < annotation_frequency_threshold <= 1.0
), "The annotation_frequency_threshold must be in the range (0, 1]"
self._hpo_annotation_frequency_threshold = annotation_frequency_threshold

self._general_hpo_terms = set(general_hpo_terms)
Expand Down Expand Up @@ -429,17 +432,17 @@ def filter(
continue

if term_id in self._general_hpo_terms:
results[idx] = HpoMtcFilter.SKIPPING_GENERAL_TERM
results[idx] = IfHpoFilter.SKIPPING_GENERAL_TERM
continue

if not self._hpo.graph.is_ancestor_of(PHENOTYPIC_ABNORMALITY, term_id):
results[idx] = HpoMtcFilter.SKIPPING_NON_PHENOTYPE_TERM
results[idx] = IfHpoFilter.SKIPPING_NON_PHENOTYPE_TERM
continue

ph_clf = pheno_clfs[idx]
contingency_matrix = counts[idx]

max_freq = HpoMtcFilter.get_maximum_group_observed_HPO_frequency(
max_freq = IfHpoFilter.get_maximum_group_observed_HPO_frequency(
contingency_matrix,
ph_clf=ph_clf,
)
Expand All @@ -465,19 +468,19 @@ def filter(
results[idx] = self._not_powered_for_2_by_3
continue

if not HpoMtcFilter.some_cell_has_greater_than_one_count(
if not IfHpoFilter.some_cell_has_greater_than_one_count(
counts=contingency_matrix,
ph_clf=ph_clf,
):
results[idx] = HpoMtcFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO
results[idx] = IfHpoFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO
continue

elif HpoMtcFilter.one_genotype_has_zero_hpo_observations(
elif IfHpoFilter.one_genotype_has_zero_hpo_observations(
counts=contingency_matrix,
gt_clf=gt_clf,
):
results[idx] = (
HpoMtcFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS
IfHpoFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS
)
continue

Expand All @@ -501,7 +504,7 @@ def filter(
axis=None
) < 1:
# Do not test if the count is exactly the same to the counts in the only child term.
results[idx] = HpoMtcFilter.SAME_COUNT_AS_THE_ONLY_CHILD
results[idx] = IfHpoFilter.SAME_COUNT_AS_THE_ONLY_CHILD
continue

# ##
Expand All @@ -526,18 +529,18 @@ def possible_results(self) -> typing.Collection[PhenotypeMtcResult]:
return (
PhenotypeMtcFilter.OK,
self._below_frequency_threshold, # HMF01
HpoMtcFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO, # HMF02
HpoMtcFilter.SAME_COUNT_AS_THE_ONLY_CHILD, # HMF03
HpoMtcFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS, # HMF05
IfHpoFilter.NO_GENOTYPE_HAS_MORE_THAN_ONE_HPO, # HMF02
IfHpoFilter.SAME_COUNT_AS_THE_ONLY_CHILD, # HMF03
IfHpoFilter.SKIPPING_SINCE_ONE_GENOTYPE_HAD_ZERO_OBSERVATIONS, # HMF05
self._not_powered_for_2_by_2, # HMF06
self._not_powered_for_2_by_3, # HMF06
HpoMtcFilter.SKIPPING_NON_PHENOTYPE_TERM, # HMF07
HpoMtcFilter.SKIPPING_GENERAL_TERM, # HMF08
IfHpoFilter.SKIPPING_NON_PHENOTYPE_TERM, # HMF07
IfHpoFilter.SKIPPING_GENERAL_TERM, # HMF08
self._below_annotation_frequency_threshold, # HMF09
)

def filter_method_name(self) -> str:
return "HPO MTC filter"
return "Independent filtering HPO filter"

@staticmethod
def get_number_of_observed_hpo_observations(
Expand Down Expand Up @@ -629,3 +632,65 @@ def _get_ordered_terms(

# now, ordered_term_list is ordered from leaves to root
return ordered_term_list


class HpoMtcFilter(IfHpoFilter):
"""
`HpoMtcFilter` is deprecated and will be removed in `1.0.0`.
Use :class:`gpsea.analysis.mtc_filter.IfHpoFilter` instead.
"""

@staticmethod
def default_filter(
hpo: hpotk.MinimalOntology,
term_frequency_threshold: float = 0.4,
annotation_frequency_threshold: float = 0.4,
phenotypic_abnormality: hpotk.TermId = PHENOTYPIC_ABNORMALITY,
):
"""
Args:
hpo: HPO
term_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency
for an HPO term to have in at least one of the genotype groups
(e.g., 22% in missense and 3% in nonsense genotypes would be OK,
but not 13% missense and 10% nonsense genotypes if the threshold is 0.2).
The default threshold is `0.4` (40%).
annotation_frequency_threshold: a `float` in range :math:`(0, 1]` with the minimum frequency of
annotation in the cohort. For instance, if the cohort consists of 100 individuals, and
we have explicit observed observations for 20 and excluded for 10 individuals, then the
annotation frequency is `0.3`. The purpose of this threshold is to omit terms for which
we simply do not have much data overall. By default, we set a threshold to `0.4` (40%).
phenotypic_abnormality: a :class:`~hpotk.TermId` corresponding to the root of HPO phenotype hierarchy.
Having to specify this option should be very rarely, if ever.
"""
warnings.warn(
"HpoMtcFilter has been deprecated and will be removed in 1.0.0. Use `IfHpoFilter` instead.",
DeprecationWarning,
stacklevel=2,
)
IfHpoFilter.default_filter(
hpo=hpo,
term_frequency_threshold=term_frequency_threshold,
annotation_frequency_threshold=annotation_frequency_threshold,
phenotypic_abnormality=phenotypic_abnormality,
)

def __init__(
self,
hpo: hpotk.MinimalOntology,
term_frequency_threshold: float,
annotation_frequency_threshold: float,
general_hpo_terms: typing.Iterable[hpotk.TermId],
):
super().__init__(
hpo,
term_frequency_threshold,
annotation_frequency_threshold,
general_hpo_terms,
)
warnings.warn(
"HpoMtcFilter has been deprecated and will be removed in 1.0.0. Use `IfHpoFilter` instead.",
DeprecationWarning,
stacklevel=2,
)
6 changes: 3 additions & 3 deletions src/gpsea/analysis/pcats/_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -2,7 +2,7 @@

import hpotk

from ..mtc_filter import HpoMtcFilter
from ..mtc_filter import IfHpoFilter
from ._impl import HpoTermAnalysis
from .stats import CountStatistic, FisherExactTest

Expand All @@ -16,13 +16,13 @@ def configure_hpo_term_analysis(
"""
Configure HPO term analysis with default parameters.
The default analysis will pre-filter HPO terms with :class:`~gpsea.analysis.mtc_filter.HpoMtcFilter`,
The default analysis will pre-filter HPO terms with :class:`~gpsea.analysis.mtc_filter.IfHpoFilter`,
then compute nominal p values using `count_statistic` (default Fisher exact test),
and apply multiple testing correction (default Benjamini/Hochberg (`fdr_bh`))
with target `mtc_alpha` (default `0.05`).
"""
return HpoTermAnalysis(
mtc_filter=HpoMtcFilter.default_filter(hpo),
mtc_filter=IfHpoFilter.default_filter(hpo),
count_statistic=count_statistic,
mtc_correction=mtc_correction,
mtc_alpha=mtc_alpha,
Expand Down
4 changes: 2 additions & 2 deletions tests/analysis/pcats/test_hpo_term_analysis.py
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@

from gpsea.model import Cohort

from gpsea.analysis.mtc_filter import PhenotypeMtcFilter, HpoMtcFilter
from gpsea.analysis.mtc_filter import PhenotypeMtcFilter, IfHpoFilter
from gpsea.analysis.pcats import HpoTermAnalysis
from gpsea.analysis.pcats.stats import CountStatistic, FisherExactTest
from gpsea.analysis.clf import GenotypeClassifier, PhenotypeClassifier
Expand All @@ -22,7 +22,7 @@ def phenotype_mtc_filter(
self,
hpo: hpotk.MinimalOntology,
) -> PhenotypeMtcFilter:
return HpoMtcFilter.default_filter(
return IfHpoFilter.default_filter(
hpo=hpo,
term_frequency_threshold=0.2,
annotation_frequency_threshold=0.25,
Expand Down
Loading

0 comments on commit d44e718

Please sign in to comment.