Skip to content

Commit

Permalink
Documentation improvements (#187)
Browse files Browse the repository at this point in the history
* replace _ with ` in a docstring

* add missing "[see superclass]"

* add a summary sentence to two method docstrings (above :return:)

* create properties in UnivariateSimulationResult for better sphinx docs

* remove badges

* update article link

* update article link

* refactor ShapPlotData to base NamedTuple for clearer API & documentation

* fix dollar value rendering

* update doc_url

* update summary description of crossfit package

* upgrade environment.yml so latest anaconda can solve environment

* updates for pypi and conda package pages

* bump initial release version

* update docstring

Co-authored-by: Jason Bentley <Bentley.Jason@bcg.com>
Co-authored-by: Jan Ittner <jan@MacBook-Pro.local>
  • Loading branch information
3 people authored Jan 11, 2021
1 parent 27453dc commit c52b3c6
Show file tree
Hide file tree
Showing 14 changed files with 179 additions and 54 deletions.
4 changes: 2 additions & 2 deletions README.rst
Original file line number Diff line number Diff line change
Expand Up @@ -45,7 +45,7 @@ FACET is composed of the following key components:

.. Begin-Badges
|pypi| |conda| |azure_build| |azure_code_cov|
|pypi| |conda|
|python_versions| |code_style| |made_with_sphinx_doc| |License_badge|

.. End-Badges
Expand Down Expand Up @@ -430,7 +430,7 @@ or have a look at
:target: https://github.com/psf/black

.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg
:target: https://www.sphinx-doc.org/
:target: https://bcg-gamma.github.io/facet/index.html

.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg
:target: https://opensource.org/licenses/Apache-2.0
Expand Down
4 changes: 2 additions & 2 deletions condabuild/meta.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -50,10 +50,10 @@ about:
home: https://github.com/BCG-Gamma/facet
license: Apache Software License v2.0
license_file: LICENSE
summary: |
description: |
FACET is an open source library for human-explainable AI. It combines sophisticated
model inspection and model-based simulation to enable better explanations of
your supervised machine learning models.
dev_url: https://github.com/BCG-Gamma/facet
doc_url: https://pypi.org/project/facet/ # TODO - replace with docs
doc_url: https://bcg-gamma.github.io/facet/
doc_source_url: https://github.com/BCG-Gamma/facet/blob/develop/README.rst
21 changes: 10 additions & 11 deletions environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -4,38 +4,37 @@ channels:
dependencies:
- black = 20.8b1
- boruta_py = 0.3.*
- conda-build
- conda-verify
- docutils
- flit = 3.0
- conda-build = 3.*
- conda-verify = 3.*
- docutils = 0.16.*
- flake8 = 3.8.*
- flake8-comprehensions = 3.2.*
- flit = 3.0.*
- isort = 5.5.*
- joblib = 0.16.*
- joblib = 1.0.*
- jupyter >= 1.0
- lightgbm = 3.0.*
- lightgbm = 3.*
- m2r = 0.2.*
- matplotlib = 3.3.*
- nbsphinx = 0.7.*
- numpy = 1.19.*
- pandas = 1.1.*
- pip = 20.*
- pluggy = 0.13.*
- pre-commit = 2.7.*
- pydata-sphinx-theme = 0.4.*
- pytest = 5.2.*
- pytest-cov = 2.8.*
- python = 3.8.*
- pyyaml = 5.1.*
- pyyaml = 5.3.*
- scikit-learn >=0.23.1,<0.24
- scipy = 1.5.*
- seaborn = 0.11.*
- shap = 0.35.*
- sphinx = 3.2.*
- sphinx = 3.4.*
- sphinx-autodoc-typehints = 1.11.*
- tableone = 0.7.*
- typing_inspect = 0.6.*
- toml = 0.10.*
- tox = 3.20.*
- typing_inspect = 0.6.*
- xlrd = 1.2.*
- yaml = 0.1.*
- yaml = 0.2.*
76 changes: 76 additions & 0 deletions pypi_description.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,76 @@
FACET is an open source library for human-explainable AI.
It combines sophisticated model inspection and model-based simulation to enable better
explanations of your supervised machine learning models.

FACET is composed of the following key components:


**Model Inspection**

FACET introduces a new algorithm to quantify dependencies and
interactions between features in ML models.
This new tool for human-explainable AI adds a new, global
perspective to the observation-level explanations provided by the
popular `SHAP <https://shap.readthedocs.io/en/stable/>`__ approach.
To learn more about FACET’s model inspection capabilities, see the
getting started example below.


**Model Simulation**

FACET’s model simulation algorithms use ML models for
*virtual experiments* to help identify scenarios that optimise
predicted outcomes.
To quantify the uncertainty in simulations, FACET utilises a range
of bootstrapping algorithms including stationary and stratified
bootstraps.
For an example of FACET’s bootstrap simulations, see the
quickstart example below.


**Enhanced Machine Learning Workflow**

FACET offers an efficient and transparent machine learning
workflow, enhancing
`scikit-learn <https://scikit-learn.org/stable/index.html>`__'s
tried and tested pipelining paradigm with new capabilities for model
selection, inspection, and simulation.
FACET also introduces
`sklearndf <https://github.com/BCG-Gamma/sklearndf>`__, an augmented
version of *scikit-learn* with enhanced support for *pandas* data
frames that ensures end-to-end traceability of features.


.. Begin-Badges
|pypi| |conda| |python_versions| |code_style| |made_with_sphinx_doc| |License_badge|

.. End-Badges
License
---------------------------

FACET is licensed under Apache 2.0 as described in the
`LICENSE <https://github.com/BCG-Gamma/facet/blob/develop/LICENSE>`_ file.

.. Begin-Badges
.. |conda| image:: https://anaconda.org/bcg_gamma/gamma-facet/badges/version.svg
:target: https://anaconda.org/BCG_Gamma/gamma-facet

.. |pypi| image:: https://badge.fury.io/py/gamma-facet.svg
:target: https://pypi.org/project/gamma-facet/

.. |python_versions| image:: https://img.shields.io/badge/python-3.6|3.7|3.8-blue.svg
:target: https://www.python.org/downloads/release/python-380/

.. |code_style| image:: https://img.shields.io/badge/code%20style-black-000000.svg
:target: https://github.com/psf/black

.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg
:target: https://bcg-gamma.github.io/facet/index.html

.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg
:target: https://opensource.org/licenses/Apache-2.0

.. End-Badges
6 changes: 3 additions & 3 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ exclude = [".idea", "tmp", "dist", ".tox", ".pytest_cache"]
module = "facet"
author = "Boston Consulting Group (BCG)"
home-page = "https://github.com/BCG-Gamma/facet"
description-file = "README.rst"
description-file = "pypi_description.rst"
dist-name = "gamma-facet"
license = "Apache Software License v2.0"

Expand All @@ -24,8 +24,8 @@ requires = [
"scipy >=1.2,<1.6",
"pyyaml >=5.0",
"joblib >=0.13,<1.17",
"gamma-pytools >=1.0.0",
"sklearndf >=1.0.0",
"gamma-pytools >=1.0.1",
"sklearndf >=1.0.1",
]

requires-python = ">=3.6,<4"
Expand Down
4 changes: 2 additions & 2 deletions sphinx/source/faqs.rst
Original file line number Diff line number Diff line change
Expand Up @@ -21,7 +21,7 @@ on `stackoverflow <https://stackoverflow.com/>`_.

Please keep an eye out for our scientific publication coming soon. In the meantime
please feel free to explore the
`GAMMAscope article <TO DO ADD LINK>`__
`GAMMAscope article <https://medium.com/bcggamma/gamma-facet-a-new-approach-for-universal-explanations-of-machine-learning-models-b566877e7812>`__
to get an introduction to using the algorithm.

3. **How can I contribute?**
Expand Down Expand Up @@ -157,5 +157,5 @@ Bibtex entry::
title={FACET},
author={FACET Team at BCG GAMMA},
year={2021},
note={Python package version 1.0.0)
note={Python package version 1.0.1)
}
2 changes: 1 addition & 1 deletion sphinx/source/tutorial/Model_simulation_deep_dive.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -39,7 +39,7 @@
"\n",
"This tutorial aims to provide a step by step explanation about the simulation capabilities of FACET and is based on the Water Drilling Tutorial. If you would like further background to this tutorial we recommend reviewing Water Drilling tutorial first, however, as a brief re-cap:\n",
"\n",
"Drilling a water well is very dangerous and costly. The costs of such drilling are driven by the time it takes to finalize a well in order to start pumping water from it. To reduce those costs, drillers are usually incentivized to drill at a faster pace—measured as the Rate of Penetration (ROP). Depending on soil characteristics, day rates can range from $\\$$30,000 to $\\$$250,000. But there is a trade-off: Drilling faster increases the risk of incidents, such as a formation collapse or a gas infiltration. We will therefore built a machine-learning model to understand the impact of drilling speed on the incident risk, in the context of other risk factors. \n",
"Drilling a water well is very dangerous and costly. The costs of such drilling are driven by the time it takes to finalize a well in order to start pumping water from it. To reduce those costs, drillers are usually incentivized to drill at a faster pace—measured as the Rate of Penetration (ROP). Depending on soil characteristics, day rates can range from `$30,000` to `$250,000`. But there is a trade-off: Drilling faster increases the risk of incidents, such as a formation collapse or a gas infiltration. We will therefore built a machine-learning model to understand the impact of drilling speed on the incident risk, in the context of other risk factors. \n",
"\n",
"For the sake of clarity, we use a simplified dataset for this example. The dataset contains 500 observations, with each row representing a drilling operation of the past, along with a binary indicator of whether or not a well-drilling incident happened in the operation. \n",
"\n",
Expand Down
6 changes: 4 additions & 2 deletions sphinx/source/tutorials.rst
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,8 @@ Introduce yourself to the FACET workflow! With this tutorial you will:
impact incident risk, with a particular focus on rate of penetration.

Start exploring the tutorial below by clicking on the section links, deepen your
understanding by reading the associated `GAMMAscope article <TO DO ADD LINK>`__, and
understanding by reading the associated
`GAMMAscope article <https://medium.com/bcggamma/gamma-facet-a-new-approach-for-universal-explanations-of-machine-learning-models-b566877e7812>`__, and
download the notebook for yourself
:download:`here <tutorial/Water_Drilling_Incident_Classification_with_Facet.ipynb>`.

Expand Down Expand Up @@ -84,7 +85,8 @@ a step by step explanation about the simulation capabilities of FACET and
is based on the introductory Water Drilling Tutorial above.

Start exploring the tutorial below by clicking on the section links, deepen your
understanding by reading the associated `GAMMAscope article <TO DO ADD LINK>`__, and
understanding by reading the associated
`GAMMAscope article <https://medium.com/bcggamma/gamma-facet-a-new-approach-for-universal-explanations-of-machine-learning-models-b566877e7812>`__, and
download the notebook for yourself
:download:`here <tutorial/Model_simulation_deep_dive.ipynb>`.

Expand Down
4 changes: 3 additions & 1 deletion src/facet/__init__.py
Original file line number Diff line number Diff line change
@@ -1,10 +1,12 @@
"""
Human-explainable AI.
This is the class and function reference of FACET for advanced model selection,
inspection, and simulation.
"""


__version__ = "1.0.0"
__version__ = "1.0.1"

__logo__ = (
r"""
Expand Down
7 changes: 4 additions & 3 deletions src/facet/crossfit/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,10 @@
split; used as the basis for learner selection and inspection.
:class:`.LearnerCrossfit` encapsulates a fully trained pipeline.
It contains a :class:`~sklearndf.PipelineDF` (preprocessing and estimator),
a dataset given by a :class:`.Sample` object and a
cross-validator. The pipeline is fitted accordingly.
It contains a :class:`~.sklearndf.LearnerPipelineDF` (preprocessing and learner),
a dataset given by a :class:`.Sample` object, and a
cross-validator.
The pipeline is fitted accordingly.
"""

from ._crossfit import *
4 changes: 4 additions & 0 deletions src/facet/crossfit/_crossfit.py
Original file line number Diff line number Diff line change
Expand Up @@ -298,6 +298,8 @@ def resize(self: T_Self, n_splits: int) -> T_Self:

def splits(self) -> Iterator[Tuple[Sequence[int], Sequence[int]]]:
"""
Get an iterator of all train/test splits used by this crossfit.
:return: an iterator of all train/test splits used by this crossfit
"""
self._ensure_fitted()
Expand All @@ -308,6 +310,8 @@ def splits(self) -> Iterator[Tuple[Sequence[int], Sequence[int]]]:

def models(self) -> Iterator[T_LearnerPipelineDF]:
"""
Get an iterator of all models fitted on the cross-validation train splits.
:return: an iterator of all models fitted on the cross-validation train splits
"""
self._ensure_fitted()
Expand Down
60 changes: 42 additions & 18 deletions src/facet/inspection/_inspection.py
Original file line number Diff line number Diff line change
Expand Up @@ -3,7 +3,7 @@
"""

import logging
from typing import Generic, List, NamedTuple, Optional, TypeVar, Union, cast
from typing import Generic, List, Optional, TypeVar, Union, cast

import numpy as np
import pandas as pd
Expand Down Expand Up @@ -57,23 +57,46 @@
#


class ShapPlotData(NamedTuple):
class ShapPlotData:
"""
Data for use in SHAP plots provided by the
`shap <https://shap.readthedocs.io/en/stable/>`__ package.
"""

#: Matrix of SHAP values (number of observations by number of features)
#: or list of shap value matrices for multi-output models.
shap_values: Union[np.ndarray, List[np.ndarray]]
def __init__(
self, shap_values: Union[np.ndarray, List[np.ndarray]], sample: Sample
):
"""
:param shap_values: the shap values for all observations and outputs
:param sample: (sub)sample of all observations for which SHAP values are
available; aligned with param ``shap_values``
"""
self._shap_values = shap_values
self._sample = sample

@property
def shap_values(self) -> Union[np.ndarray, List[np.ndarray]]:
"""
Matrix of SHAP values (number of observations by number of features)
or list of shap value matrices for multi-output models.
"""
return self._shap_values

#: Matrix of feature values (number of observations by number of features).
features: pd.DataFrame
@property
def features(self) -> pd.DataFrame:
"""
Matrix of feature values (number of observations by number of features).
"""
return self._sample.features

#: Series of target values (number of observations)
#: or matrix of target values for multi-output models
#: (number of observations by number of outputs).
target: Union[pd.Series, pd.DataFrame]
@property
def target(self) -> Union[pd.Series, pd.DataFrame]:
"""
Series of target values (number of observations)
or matrix of target values for multi-output models
(number of observations by number of outputs).
"""
return self._sample.target


@inheritdoc(match="[see superclass]")
Expand Down Expand Up @@ -745,12 +768,12 @@ def shap_plot_data(self) -> ShapPlotData:
for use in SHAP plots offered by the
`shap <https://shap.readthedocs.io/en/stable/>`__ package.
The _shap_ package provides functions for creating various SHAP plots.
The `shap` package provides functions for creating various SHAP plots.
Most of these functions require
- one or more SHAP value matrices as a single _numpy_ array,
or a list of _numpy_ arrays of shape _(n_observations, n_features)_
- a feature matrix of shape _(n_observations, n_features)_, which can be
- one or more SHAP value matrices as a single `numpy` array,
or a list of `numpy` arrays of shape `(n_observations, n_features)`
- a feature matrix of shape `(n_observations, n_features)`, which can be
provided as a data frame to preserve feature names
This method provides this data inside a :class:`.ShapPlotData` object, plus
Expand All @@ -774,7 +797,9 @@ def shap_plot_data(self) -> ShapPlotData:
consolidate="mean"
)

output_names = self.output_names_
output_names: List[str] = self.output_names_
shap_values_numpy: Union[np.ndarray, List[np.ndarray]]
included_observations: pd.Index

if len(output_names) > 1:
shap_values: List[pd.DataFrame]
Expand All @@ -789,8 +814,7 @@ def shap_plot_data(self) -> ShapPlotData:

return ShapPlotData(
shap_values=shap_values_numpy,
features=sample.features,
target=sample.target,
sample=sample,
)

def __feature_matrix_to_df(
Expand Down
Loading

0 comments on commit c52b3c6

Please sign in to comment.