Skip to content

Commit 10f1d35

Browse files
j-ittnerjason-bentleysithankanna
authoredNov 4, 2020
Update the overview section in facet's README.rst; plus minor fixes and enhancements (#142)
* update the overview section in facet's README.rst * use emphasized `facet` as the uniform spelling for facet * minor edits and fixes to README.rst * use '*' instead of '`' to indicate emphasis * Update RST formatting * fine-tune column widths in summary table * add a spacer to increase icon sizes * Update the getting started notebook with the changes from the `facet` README section (#143) Co-authored-by: Jason Bentley <Bentley.Jason@bcg.com> Co-authored-by: sithankanna <sithankanna@gmail.com>
1 parent c648bd9 commit 10f1d35

File tree

2 files changed

+115
-89
lines changed

2 files changed

+115
-89
lines changed
 

‎README.rst

+97-66
Original file line numberDiff line numberDiff line change
@@ -2,46 +2,54 @@
22

33
|
44
5-
Facet is an open source library for human-explainable AI. It combines sophisticated
6-
model inspection and model-based simulation to enable better explanations of your
7-
supervised machine learning models. Facet is composed of the following key components:
8-
9-
+-------------------+---------------------------------------------------------------------------+
10-
| |pipe| | **Enhanced Machine Learning Workflow** |
11-
| | |
12-
| | Facet delivers a robust and fail-safe pipelining workflow which allows you|
13-
| | to easily impute and select your features as well as ranking a grid of |
14-
| | different models "competing" against each other. Facet introduces |
15-
| | `sklearndf <https://github.com/BCG-Gamma/sklearndf>`_, an augmented |
16-
| | version of `scikit-learn <https://scikit-learn.org/stable/index.html>`_ |
17-
| | with enhanced support for `pandas <https://pandas.pydata.org/>`_ |
18-
| | dataframes and pipelining. |
19-
| | |
20-
+-------------------+---------------------------------------------------------------------------+
21-
| |inspect| | **Model Inspection** |
22-
| | |
23-
| | Local explanations of features and their interactions make up a key |
24-
| | component of understanding feature importance as well as feature |
25-
| | interactions. This is based on a novel method which decomposes |
26-
| | `SHAP values <https://shap.readthedocs.io/en/latest/>`_ into |
27-
| | two vectors representing **synergy** and **redundancy**. |
28-
| | |
29-
+-------------------+---------------------------------------------------------------------------+
30-
| |sim| | **Model Simulation** |
31-
| | |
32-
| | Use your trained model and the insights from the model inspection to |
33-
| | conduct a historical univariate simulation of any feature on your target |
34-
| | in order to identify local optima. |
35-
+-------------------+---------------------------------------------------------------------------+
36-
5+
*facet* is an open source library for human-explainable AI.
6+
It combines sophisticated model inspection and model-based simulation to enable better
7+
explanations of your supervised machine learning models.
8+
9+
*facet* is composed of the following key components:
10+
11+
+----------------+---------------------------------------------------------------------+
12+
| |inspect| | **Model Inspection** |
13+
| | |
14+
| | *facet* introduces a new algorithm to quantify dependencies and |
15+
| | interactions between features in ML models. |
16+
| | This new tool for human-explainable AI adds a new, global |
17+
| | perspective to the observation-level explanations provided by the |
18+
| | popular `SHAP <https://shap.readthedocs.io/en/latest/>`_ approach. |
19+
| | To learn more about *facet*’s model inspection capabilities, see the|
20+
| | getting started example below. |
21+
+----------------+---------------------------------------------------------------------+
22+
| |sim| | **Model Simulation** |
23+
| | |
24+
| | *facet*’s model simulation algorithms use ML models for |
25+
| | *virtual experiments* to help identify scenarios that optimise |
26+
| | predicted outcomes. |
27+
| | To quantify the uncertainty in simulations, *facet* utilises a range|
28+
| | of bootstrapping algorithms including stationary and stratified |
29+
| | bootstraps. |
30+
| | For an example of *facet*’s bootstrap simulations, see the getting |
31+
| | started example below. |
32+
+----------------+---------------------------------------------------------------------+
33+
| |pipe| | **Enhanced Machine Learning Workflow** |
34+
| |spacer| | |
35+
| | *facet* offers an efficient and transparent machine learning |
36+
| | workflow, enhancing |
37+
| | `scikit-learn <https://scikit-learn.org/stable/index.html>`_'s |
38+
| | tried and tested pipelining paradigm with new capabilities for model|
39+
| | selection, inspection, and simulation. |
40+
| | *facet* also introduces |
41+
| | `sklearndf <https://github.com/BCG-Gamma/sklearndf>`_, an augmented |
42+
| | version of *scikit-learn* with enhanced support for *pandas* data |
43+
| | frames that ensures end-to-end traceability of features. |
44+
+----------------+---------------------------------------------------------------------+
3745

3846
|azure_pypi| |azure_conda| |azure_devops_master_ci| |code_cov|
3947
|python_versions| |code_style| |made_with_sphinx_doc| |License_badge|
4048

4149
Installation
4250
---------------------
4351

44-
Facet supports both PyPI and Anaconda.
52+
*facet* supports both PyPI and Anaconda.
4553

4654
Anaconda
4755
~~~~~~~~~~~~~~~~~~~~~
@@ -61,9 +69,9 @@ Quickstart
6169
----------------------
6270

6371
The following quickstart guide provides a minimal example workflow to get up and running
64-
with Facet.
72+
with *facet*.
6573

66-
Enhanced machine learning workflow
74+
Enhanced Machine Learning Workflow
6775
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
6876

6977
.. code-block:: Python
@@ -83,45 +91,52 @@ Enhanced machine learning workflow
8391
8492
# load Boston housing dataset
8593
boston = load_boston()
86-
df = pd.DataFrame(data=boston.data, columns=boston.feature_names).assign(
94+
boston_df = pd.DataFrame(data=boston.data, columns=boston.feature_names).assign(
8795
MEDIAN_HOUSE_PRICE=boston.target
8896
)
8997
9098
# create FACET sample object
91-
boston_obs = Sample(observations=df, target_name="MEDIAN_HOUSE_PRICE")
99+
boston_sample = Sample(observations=boston_df, target_name="MEDIAN_HOUSE_PRICE")
92100
93-
# create pipeline for random forest regressor
94-
rforest_reg = RegressorPipelineDF(regressor=RandomForestRegressorDF(random_state=42))
101+
# create a (trivial) pipeline for a random forest regressor
102+
rnd_forest_reg = RegressorPipelineDF(
103+
regressor=RandomForestRegressorDF(random_state=42)
104+
)
95105
96106
# define grid of models which are "competing" against each other
97-
rforest_grid = [
107+
rnd_forest_grid = [
98108
LearnerGrid(
99-
pipeline=rforest_reg, learner_parameters={"min_samples_leaf": [8, 11, 15]}
100-
)
109+
pipeline=rnd_forest_reg,
110+
learner_parameters={
111+
"min_samples_leaf": [8, 11, 15]
112+
}
113+
),
101114
]
102115
103116
# create repeated k-fold CV iterator
104117
rkf_cv = RepeatedKFold(n_splits=5, n_repeats=10, random_state=42)
105118
106119
# rank your models by performance (default is variance explained)
107-
ranker = LearnerRanker(grids=rforest_grid, cv=rkf_cv, n_jobs=-3).fit(sample=boston_obs)
120+
ranker = LearnerRanker(
121+
grids=rnd_forest_grid, cv=rkf_cv, n_jobs=-3
122+
).fit(sample=boston_sample)
108123
109124
# get summary report
110125
ranker.summary_report()
111126
112127
.. image:: _static/ranker_summary.png
113-
:width: 600
128+
:width: 600
114129

115130
Model Inspection
116131
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
117132

118-
Facet implements several model inspection methods for
133+
*facet* implements several model inspection methods for
119134
`scikit-learn <https://scikit-learn.org/stable/index.html>`_ estimators.
120135
Fundamentally, facet enables post-hoc model inspection by breaking down the interaction
121136
effects of the features used for model training:
122137

123138
- **Redundancy**
124-
represents how much information is shared between two features contributions to
139+
represents how much information is shared between two features' contributions to
125140
the model predictions. For example, temperature and pressure in a pressure cooker are
126141
redundant features for predicting cooking time since pressure will rise relative to
127142
the temperature, and vice versa. Therefore, knowing just one of either temperature or
@@ -214,46 +229,47 @@ Model Simulation
214229
215230
.. image:: _static/simulation_output.png
216231

217-
Download the getting started tutorial and explore Facet for yourself here: |binder|
232+
Download the getting started tutorial and explore *facet* for yourself here: |binder|
218233

219234
Contributing
220235
---------------------------
221236

222-
Facet is stable and is being supported long-term.
237+
*facet* is stable and is being supported long-term.
223238

224-
Contributions to Facet are welcome and appreciated.
239+
Contributions to *facet* are welcome and appreciated.
225240
For any bug reports or feature requests/enhancements please use the appropriate
226241
`GitHub form <https://github.com/BCG-Gamma/facet/issues>`_, and if you wish to do so,
227242
please open a PR addressing the issue.
228243

229244
We do ask that for any major changes please discuss these with us first via an issue or
230-
at our team email: FacetTeam <at> bcg <dot> com.
245+
using our team email: FacetTeam <at> bcg <dot> com.
231246

232247
For further information on contributing please see our :ref:`contribution-guide`.
233248

234249
License
235250
---------------------------
236251

237-
Facet is licensed under Apache 2.0 as described in the
252+
*facet* is licensed under Apache 2.0 as described in the
238253
`LICENSE <https://github.com/BCG-Gamma/facet/LICENSE>`_ file.
239254

240255
Acknowledgements
241256
---------------------------
242257

243-
Facet is built on top of two popular packages for Machine Learning:
258+
*facet* is built on top of two popular packages for Machine Learning:
244259

245260
The `scikit-learn <https://github.com/scikit-learn/scikit-learn>`_ learners and
246261
pipelining make up implementation of the underlying algorithms. Moreover, we tried
247262
to design the facet API to align with the scikit-learn API.
248263

249264
The `shap <https://github.com/slundberg/shap>`_ implementation is used to estimate the
250-
shapley vectors which are being decomposed into the synergy, redundancy, and
251-
independence vectors.
265+
shapley vectors which *facet* then decomposes into synergy, redundancy, and independence
266+
vectors.
252267

253268
BCG GAMMA
254269
---------------------------
255270

256-
If you would like to know more about the team behind Facet please see our :ref:`about_us` page.
271+
If you would like to know more about the team behind *facet* please see our
272+
:ref:`about_us` page.
257273

258274
We are always on the lookout for passionate and talented data scientists to join the
259275
BCG GAMMA team. If you would like to know more you can find out about BCG GAMMA
@@ -262,27 +278,42 @@ or have a look at
262278
`career opportunities <https://www.bcg.com/en-gb/beyond-consulting/bcg-gamma/careers>`_.
263279

264280
.. |pipe| image:: _static/icons/pipe_icon.png
265-
:class: facet_icon
281+
:width: 64px
282+
:class: facet_icon
283+
266284
.. |inspect| image:: _static/icons/inspect_icon.png
267-
:class: facet_icon
285+
:width: 64px
286+
:class: facet_icon
287+
268288
.. |sim| image:: _static/icons/sim_icon.png
289+
:width: 64px
269290
:class: facet_icon
270291

292+
.. |spacer| unicode:: 0x2028 0x2003 0x2003 0x2003 0x2003 0x2003 0x2003
293+
271294
.. |azure_conda| image:: https://
272-
:target: https://
295+
:target: https://
296+
273297
.. |azure_pypi| image:: https://
274-
:target: https://
298+
:target: https://
299+
275300
.. |azure_devops_master_ci| image:: https://
276-
:target: https://
301+
:target: https://
302+
277303
.. |code_cov| image:: https://
278-
:target: https://
304+
:target: https://
305+
279306
.. |python_versions| image:: https://img.shields.io/badge/python-3.7|3.8-blue.svg
280-
:target: https://www.python.org/downloads/release/python-380/
307+
:target: https://www.python.org/downloads/release/python-380/
308+
281309
.. |code_style| image:: https://img.shields.io/badge/code%20style-black-000000.svg
282-
:target: https://github.com/psf/black
310+
:target: https://github.com/psf/black
311+
283312
.. |made_with_sphinx_doc| image:: https://img.shields.io/badge/Made%20with-Sphinx-1f425f.svg
284-
:target: https://www.sphinx-doc.org/
313+
:target: https://www.sphinx-doc.org/
314+
285315
.. |license_badge| image:: https://img.shields.io/badge/License-Apache%202.0-olivegreen.svg
286-
:target: https://opensource.org/licenses/Apache-2.0
316+
:target: https://opensource.org/licenses/Apache-2.0
317+
287318
.. |binder| image:: https://mybinder.org/badge_logo.svg
288-
:target: https://mybinder.org/
319+
:target: https://mybinder.org/

‎sphinx/auxiliary/Boston_getting_started_example.ipynb

+18-23
Original file line numberDiff line numberDiff line change
@@ -64,26 +64,21 @@
6464
"cell_type": "markdown",
6565
"metadata": {},
6666
"source": [
67-
"FACET is composed of the following key components:\n",
68-
"\n",
69-
"- **Enhanced machine learning workflow**:\n",
70-
" Facet delivers a robust and fail-safe pipelining\n",
71-
" workflow which allows you to easily impute and select your features as well as\n",
72-
" ranking a grid of different models \"competing\" against each other. Facet introduces\n",
73-
" **sklearndf**, an augmented version of scikit-learn with enhanced support for pandas\n",
74-
" data frames and pipelining.\n",
67+
"`facet` is composed of the following key components:\n",
7568
"\n",
7669
"- **Model Inspection**:\n",
77-
" Local explanations of features and their interactions make up a key\n",
78-
" component of understanding feature importance as well as feature interactions.\n",
79-
" This is based on a novel method which decomposes\n",
80-
" [SHAP values](<https://shap.readthedocs.io/en/latest/>) into\n",
81-
" two vectors representing **synergy** and **redundancy**.\n",
82-
"\n",
83-
"- **Model Simulation**:\n",
84-
" Use your trained model and the insights from the model inspection\n",
85-
" to conduct a historical simulation of any feature on your target in order to\n",
86-
" identify local optima."
70+
"\n",
71+
" `facet` introduces a new algorithm to quantify dependencies and interactions between features in ML models. This new tool for human-explainable AI adds a new, global perspective to the observation-level explanations provided by the popular [SHAP](https://shap.readthedocs.io/en/latest/) approach. To learn more about facet’s model inspection capabilities, see the getting started example below.\n",
72+
"\n",
73+
"\n",
74+
"- **Model Simulation**\n",
75+
"\n",
76+
" `facet`’s model simulation algorithms use ML models for `virtual experiments` to help identify scenarios that optimise predicted outcomes. To quantify the uncertainty in simulations, `facet` utilises a range of bootstrapping algorithms including stationary and stratified bootstraps. For an example of `facet`’s bootstrap simulations, see the getting started example below. \n",
77+
" \n",
78+
" \n",
79+
"- **Enhanced Machine Learning Workflow**: \n",
80+
"\n",
81+
" `facet` offers an efficient and transparent machine learning workflow, enhancing [`scikit-learn`]( https://scikit-learn.org/stable/index.html)'s tried and tested pipelining paradigm with new capabilities for model selection, inspection, and simulation. `facet` also introduces [`sklearndf`](https://github.com/BCG-Gamma/sklearndf), an augmented version of scikit-learn with enhanced support for pandas dataframes that ensures end-to-end traceability of features. "
8782
]
8883
},
8984
{
@@ -292,18 +287,18 @@
292287
"\n",
293288
"# load Boston housing dataset\n",
294289
"boston = load_boston()\n",
295-
"df = pd.DataFrame(data=boston.data, columns=boston.feature_names).assign(\n",
290+
"boston_df = pd.DataFrame(data=boston.data, columns=boston.feature_names).assign(\n",
296291
" MEDIAN_HOUSE_PRICE=boston.target\n",
297292
")\n",
298293
"\n",
299294
"# create FACET sample object\n",
300-
"boston_obs = Sample(observations=df, target_name=\"MEDIAN_HOUSE_PRICE\")\n",
295+
"boston_sample = Sample(observations=boston_df, target_name=\"MEDIAN_HOUSE_PRICE\")\n",
301296
"\n",
302297
"# create pipeline for random forest regressor\n",
303298
"rforest_reg = RegressorPipelineDF(regressor=RandomForestRegressorDF(random_state=42))\n",
304299
"\n",
305300
"# define grid of models which are \"competing\" against each other\n",
306-
"rforest_grid = [\n",
301+
"rnd_forest_grid = [\n",
307302
" LearnerGrid(\n",
308303
" pipeline=rforest_reg, learner_parameters={\"min_samples_leaf\": [8, 11, 15]}\n",
309304
" )\n",
@@ -313,7 +308,7 @@
313308
"rkf_cv = RepeatedKFold(n_splits=5, n_repeats=10, random_state=42)\n",
314309
"\n",
315310
"# rank your models by performance (default is variance explained)\n",
316-
"ranker = LearnerRanker(grids=rforest_grid, cv=rkf_cv, n_jobs=-3).fit(sample=boston_obs)\n",
311+
"ranker = LearnerRanker(grids=rnd_forest_grid, cv=rkf_cv, n_jobs=-3).fit(sample=boston_sample)\n",
317312
"\n",
318313
"# get summary report\n",
319314
"ranker.summary_report()"
@@ -586,7 +581,7 @@
586581
" cv=bscv,\n",
587582
" n_jobs=-3,\n",
588583
" verbose=False,\n",
589-
").fit(sample=boston_obs)\n",
584+
").fit(sample=boston_sample)\n",
590585
"\n",
591586
"SIM_FEAT = \"LSTAT\"\n",
592587
"simulator = UnivariateUpliftSimulator(crossfit=boot_crossfit, n_jobs=3)\n",

0 commit comments

Comments
 (0)
Please sign in to comment.