Skip to content

Commit

Permalink
quickstart and notebook refresh and updates (#179)
Browse files Browse the repository at this point in the history
  • Loading branch information
jason-bentley authored Jan 8, 2021
1 parent b99a13b commit 9d65b16
Show file tree
Hide file tree
Showing 13 changed files with 8,401 additions and 2,120 deletions.
154 changes: 124 additions & 30 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,15 @@ up and running with FACET.
Enhanced Machine Learning Workflow
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

To demonstrate the model inspection capability of FACET, we first create a
pipeline to fit a learner. In this simple example using the Boston housing
data, we will train a Random Forest regressor using 10 repeated 5-fold CV
to predict median house price. With the use of *sklearndf* we can create a
*pandas* DataFrame compatible workflow. However, FACET provides additional
enhancements to keep track of our feature matrix and target vector using a
sample object (`Sample`) and easily compare hyperparameter configurations
and even multiple learners with the `LearnerRanker`.

.. code-block:: Python
# standard imports
Expand Down Expand Up @@ -120,7 +129,7 @@ Enhanced Machine Learning Workflow
# create repeated k-fold CV iterator
rkf_cv = RepeatedKFold(n_splits=5, n_repeats=10, random_state=42)
# rank your candidate models by performance
# rank your candidate models by performance (default is mean CV score - 2*SD)
ranker = LearnerRanker(
grids=rnd_forest_grid, cv=rkf_cv, n_jobs=-3
).fit(sample=boston_sample)
Expand All @@ -131,48 +140,133 @@ Enhanced Machine Learning Workflow
.. image:: sphinx/source/_static/ranker_summary.png
:width: 600

We can see based on this minimal workflow that a value of 8 for minimum samples
in the leaf was the best performing of the three considered values. This approach
easily extends to multiple hyperparameters for the learner and multiple learners.

Model Inspection
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

FACET implements several model inspection methods for
`scikit-learn <https://scikit-learn.org/stable/index.html>`__ estimators.

Fundamentally, FACET enables post-hoc model inspection by breaking down the interaction
effects of the features used for model training:

- **Redundancy**
represents how much information is shared between two features' contributions to
the model predictions. For example, temperature and pressure in a pressure cooker are
redundant features for predicting cooking time since pressure will rise relative to
the temperature, and vice versa. Therefore, knowing just one of either temperature or
pressure will likely enable the same predictive accuracy. Redundancy is expressed as
a percentage ranging from 0% (full uniqueness) to 100% (full redundancy).
FACET enhances model inspection by providing global metrics that complement
the local perspective of SHAP. The key global metrics for each pair of
features in a model are:

- **Synergy**
represents how much the combined information of two features contributes to
the model predictions. For example, given features X and Y as
coordinates on a chess board, the colour of a square can only be predicted when
considering X and Y in combination. Synergy is expressed as a

The degree to which the model combines information from one feature with
another to predict the target. For example, let's assume we are predicting
cardiovascular health using age and gender and the fitted model includes
a complex interaction between them. This means these two features are
synergistic for predicting cardiovascular health. Further, both features
are important to the model and removing either one would significantly
impact performance. Let's assume age brings more information to the joint
contribution than gender. This asymmetric contribution means the synergy for
(age, gender) is less than the synergy for (gender, age). To think about it another
way, imagine the prediction is a coordinate you are trying to reach.
From your starting point, age gets you much closer to this point than
gender, however, you need both to get there. Synergy reflects the fact
that gender gets more help from age (higher synergy from the perspective
of gender) than age does from gender (lower synergy from the perspective of
age) to reach the prediction. *This leads to an important point: synergy
is a naturally asymmetric property of the global information two interacting
features contribute to the model predictions.* Synergy is expressed as a
percentage ranging from 0% (full autonomy) to 100% (full synergy).


- **Redundancy**

The degree to which a feature in a model duplicates the information of a
second feature to predict the target. For example, let's assume we had
house size and number of bedrooms for predicting house price. These
features capture similar information as the more bedrooms the larger
the house and likely a higher price on average. The redundancy for
(number of bedrooms, house size) will be greater than the redundancy
for (house size, number of bedrooms). This is because house size
"knows" more of what number of bedrooms does for predicting house price
than vice-versa. Hence, there is greater redundancy from the perspective
of number of bedrooms. Another way to think about it is removing house
size will be more detrimental to model performance than removing number
of bedrooms, as house size can better compensate for the absence of
number of bedrooms. This also implies that house size would be a more
important feature than number of bedrooms in the model. *The important
point here is that like synergy, redundancy is a naturally asymmetric
property of the global information feature pairs have for predicting
an outcome.* Redundancy is expressed as a percentage ranging from 0%
(full uniqueness) to 100% (full redundancy).

.. code-block:: Python
# fit the model inspector
from facet.inspection import LearnerInspector
inspector = LearnerInspector()
inspector.fit(crossfit=ranker.best_model_crossfit_)
# visualise redundancy as a matrix
**Synergy**

.. code-block:: Python
# visualise synergy as a matrix
from pytools.viz.matrix import MatrixDrawer
synergy_matrix = inspector.feature_synergy_matrix(symmetrical=True)
MatrixDrawer(style="matplot%").draw(synergy_matrix, title="Synergy Matrix")
.. image:: sphinx/source/_static/synergy_matrix.png
:width: 600

As before the matrix row represents the "perspective from" feature in the pair.
Looking across the row for `LSTAT` there is relatively minimal synergy (≤14%)
with other features in the model. However, looking down the column for `LSTAT`
(i.e., perspective of other features in a pair with `LSTAT`) we find many
features (the rows) are synergistic (12% to 47%) with `LSTAT`. We can conclude that:

- `LSTAT` is a strongly autonomous feature, displaying minimal synergy with other
features for predicting median house price.
- The contribution of other features to predicting median house price is partly
enabled by the strong contribution from `LSTAT`.

High synergy features must be considered carefully when investigating business
impact, as they work together to predict the outcome. It would not make much
sense to consider `ZN` (proportion of residential land zoned for lots over
25,000 sq.ft) without `LSTAT` given the 47% synergy of `ZN` with `LSTAT` for
predicting median house price.

**Redundancy**

.. code-block:: Python
# visualise redundancy as a matrix
redundancy_matrix = inspector.feature_redundancy_matrix()
MatrixDrawer(style="matplot%").draw(redundancy_matrix, title="Redundancy Matrix")
.. image:: sphinx/source/_static/redundancy_matrix.png
:width: 600

We can also better visualize redundancy as a dendrogram so we can identify clusters of
features with redundancy.
For any feature pair (A, B), the first feature (A) is the row, and the second
feature (B) the column. For example, if we look at the feature pair (`LSTAT`, `RM`)
from the perspective of `LSTAT` (percentage of lower status of the population),
then we look-up the row for `LSTAT` and the column for `RM` (average number of
rooms per dwelling) and find 39% redundancy. This means that 39% of the
information in `LSTAT` is duplicated with `RM` to predict median house price.
We can also see looking across the row for `LSTAT` that apart from the 39%
redundancy with `RM`, `LSTAT` has minimal redundancy (<5%) with any of the
other features included in the model.

**Clustering redundancy**

As detailed above redundancy and synergy for a feature pair is from the
"perspective" of one of the features in the pair, and so yields two distinct
values. However, a symmetric version can also be computed that provides not
only a simplified perspective but allows the use of (1 - metric) as a
feature distance. With this distance hierarchical, single linkage clustering
is applied to create a dendrogram visualization. This helps to identify
groups of low distance, features which activate "in tandem" to predict the
outcome. Such information can then be used to either reduce clusters of
highly redundant features to a subset or highlight clusters of highly
synergistic features that should always be considered together.

Let's look at the example for redundancy.

.. code-block:: Python
Expand All @@ -184,16 +278,12 @@ features with redundancy.
.. image:: sphinx/source/_static/redundancy_dendrogram.png
:width: 600

For feature synergy, we can get a similar picture

.. code-block:: Python
# visualise synergy as a matrix
synergy_matrix = inspector.feature_synergy_matrix()
MatrixDrawer(style="matplot%").draw(synergy_matrix, title="Synergy Matrix")
.. image:: sphinx/source/_static/synergy_matrix.png
:width: 600
Based on the dendrogram we can see that the feature pairs (`LSTAT`, `RM`)
and (`CRIM`: per capita crime rate by town, `NOX`: nitric oxides concentration
in parts per 10 million) each represent a cluster in the dendrogram and
that `LSTAT` and `RM` have high importance. As a next action we could
remove RM (and maybe NOX) to further simplify the model and obtain a
set of independent features.

Please see the
`API reference <https://bcg-gamma.github.io/facet/apidoc/facet.html>`__
Expand Down Expand Up @@ -241,7 +331,7 @@ quantify the uncertainty by using bootstrap confidence intervals.
).fit(sample=boston_sample)
SIM_FEAT = "LSTAT"
simulator = UnivariateUpliftSimulator(crossfit=boot_crossfit, n_jobs=3)
simulator = UnivariateUpliftSimulator(crossfit=boot_crossfit, n_jobs=-3)
# split the simulation range into equal sized partitions
partitioner = ContinuousRangePartitioner()
Expand All @@ -254,6 +344,10 @@ quantify the uncertainty by using bootstrap confidence intervals.
.. image:: sphinx/source/_static/simulation_output.png

We would conclude from the figure that lower values of `LSTAT` are associated with
an increase in median house price, and that the lower `LSTAT` of 8% or less results
in a significant uplift in median house price.


Contributing
---------------------------
Expand Down
Loading

0 comments on commit 9d65b16

Please sign in to comment.