Skip to content

Commit

Permalink
API: Rename ModelSelector to LearnerSelector (#348)
Browse files Browse the repository at this point in the history
* API: Rename ModelSelector to LearnerSelector

* TEST: fix parameter space definitions in regressor_parameters()

* API: accept multiple spaces as param parameter_space of LearnerInspector

* TEST: adjust test for an updated exception message

* TEST: minor tweaks

* DOC: update classification tutorial (random search, LearnerSelector API)

* DOC: tweak classification tutorial

* BUILD: update package dependencies

* DOC: update classification tutorial (random search, LearnerSelector API)

* DOC: update tutorial notebooks in preparation for the FACET 2.0 release

* DOC: tweak a headline

* BUILD: update package dependencies

* API: improve column names & sequence of LearnerSelector.summary_report()

* DOC: documentation tweaks

* API: clarify 'candidate name' terminology

* DOC: tweak release notes

* DOC: add missing intersphinx mappings

* DOC: fix link to catboost package

* DOC: address sphinx error messages

* DOC: move images from _static/ to _images/

Co-authored-by: Jan Ittner <ittner.jan@bcg.com>
  • Loading branch information
mtsokol and j-ittner authored Sep 19, 2022
1 parent dba2d2c commit 1bdc49b
Show file tree
Hide file tree
Showing 48 changed files with 1,833 additions and 2,096 deletions.
24 changes: 12 additions & 12 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. image:: sphinx/source/_static/Gamma_Facet_Logo_RGB_LB.svg
.. image:: sphinx/source/_images/Gamma_Facet_Logo_RGB_LB.svg

|
Expand Down Expand Up @@ -103,7 +103,7 @@ In this quickstart we will train a Random Forest regressor using 10 repeated
*sklearndf* we can create a *pandas* DataFrame compatible workflow. However,
FACET provides additional enhancements to keep track of our feature matrix
and target vector using a sample object (`Sample`) and easily compare
hyperparameter configurations and even multiple learners with the `ModelSelector`.
hyperparameter configurations and even multiple learners with the `LearnerSelector`.

.. code-block:: Python
Expand All @@ -117,7 +117,7 @@ hyperparameter configurations and even multiple learners with the `ModelSelector
# relevant FACET imports
from facet.data import Sample
from facet.selection import ModelSelector, ParameterSpace
from facet.selection import LearnerSelector, ParameterSpace
# declaring url with data
data_url = 'https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data'
Expand Down Expand Up @@ -153,7 +153,7 @@ hyperparameter configurations and even multiple learners with the `ModelSelector
rkf_cv = RepeatedKFold(n_splits=5, n_repeats=10, random_state=42)
# rank your candidate models by performance
selector = ModelSelector(
selector = LearnerSelector(
searcher_type=GridSearchCV,
parameter_space=rnd_forest_ps,
cv=rkf_cv,
Expand All @@ -164,7 +164,7 @@ hyperparameter configurations and even multiple learners with the `ModelSelector
# get summary report
selector.summary_report()
.. image:: sphinx/source/_static/ranker_summary.png
.. image:: sphinx/source/_images/ranker_summary.png
:width: 600

We can see based on this minimal workflow that a value of 11 for minimum
Expand Down Expand Up @@ -245,7 +245,7 @@ The key global metrics for each pair of features in a model are:
synergy_matrix = inspector.feature_synergy_matrix()
MatrixDrawer(style="matplot%").draw(synergy_matrix, title="Synergy Matrix")
.. image:: sphinx/source/_static/synergy_matrix.png
.. image:: sphinx/source/_images/synergy_matrix.png
:width: 600

For any feature pair (A, B), the first feature (A) is the row, and the second
Expand Down Expand Up @@ -273,7 +273,7 @@ to 27% synergy of `LDL` with `LTG` for predicting progression after one year.
redundancy_matrix = inspector.feature_redundancy_matrix()
MatrixDrawer(style="matplot%").draw(redundancy_matrix, title="Redundancy Matrix")
.. image:: sphinx/source/_static/redundancy_matrix.png
.. image:: sphinx/source/_images/redundancy_matrix.png
:width: 600


Expand Down Expand Up @@ -312,7 +312,7 @@ Let's look at the example for redundancy.
redundancy = inspector.feature_redundancy_linkage()
DendrogramDrawer().draw(data=redundancy, title="Redundancy Dendrogram")
.. image:: sphinx/source/_static/redundancy_dendrogram.png
.. image:: sphinx/source/_images/redundancy_dendrogram.png
:width: 600

Based on the dendrogram we can see that the feature pairs (`LDL`, `TC`)
Expand Down Expand Up @@ -371,7 +371,7 @@ we do the following for the simulation:
# visualise results
SimulationDrawer().draw(data=simulation, title=SIM_FEAT)
.. image:: sphinx/source/_static/simulation_output.png
.. image:: sphinx/source/_images/simulation_output.png

We would conclude from the figure that higher values of `BMI` are associated with
an increase in disease progression after one year, and that for a `BMI` of 28
Expand Down Expand Up @@ -427,15 +427,15 @@ BCG GAMMA team. If you would like to know more you can find out about
or have a look at
`career opportunities <https://www.bcg.com/en-gb/beyond-consulting/bcg-gamma/careers>`_.

.. |pipe| image:: sphinx/source/_static/icons/pipe_icon.png
.. |pipe| image:: sphinx/source/_images/icons/pipe_icon.png
:width: 100px
:class: facet_icon

.. |inspect| image:: sphinx/source/_static/icons/inspect_icon.png
.. |inspect| image:: sphinx/source/_images/icons/inspect_icon.png
:width: 100px
:class: facet_icon

.. |sim| image:: sphinx/source/_static/icons/sim_icon.png
.. |sim| image:: sphinx/source/_images/icons/sim_icon.png
:width: 100px
:class: facet_icon

Expand Down
83 changes: 46 additions & 37 deletions RELEASE_NOTES.rst
Original file line number Diff line number Diff line change
Expand Up @@ -2,12 +2,15 @@ Release Notes
=============

.. |mypy| replace:: :external+mypy:doc:`mypy <index>`
.. |shap| replace:: :external+shap:doc:`shap <index>`
.. |nbsp| unicode:: 0xA0
:trim:

FACET 2.0
---------

FACET 2.0 brings numerous API enhancements and improvements, accelerates model
inspection by factor 50 in many practical settings, makes major improvements to
FACET |nbsp| 2.0 brings numerous API enhancements and improvements, accelerates model
inspection by factor |nbsp| 50 in many practical settings, makes major improvements to
visualizations, and is now fully type-checked by |mypy|.


Expand All @@ -28,28 +31,30 @@ visualizations, and is now fully type-checked by |mypy|.

- API: :class:`.LearnerInspector` no longer uses learner crossfits and instead inspects
models using a single pass of SHAP calculations, usually leading to performance gains
of up to a factor of 50
- API: return :class:`.LearnerInspector` matrix outputs as :class:`.Matrix` instances
of up to a factor of |nbsp| 50
- API: return :class:`.LearnerInspector` matrix outputs as :class:`~pytools.data.Matrix`
instances
- API: diagonals of feature synergy, redundancy, and association matrices are now
``nan`` instead of 1.0
- API: the leaf order of :class:`.LinkageTree` objects generated by
``nan`` instead of |nbsp| 1.0
- API: the leaf order of :class:`~pytools.data.LinkageTree` objects generated by
``feature_…_linkage`` methods of :class:`.LearnerInspector` is now the same as the
row and column order of :class:`.Matrix` objects returned by the corresponding
``feature_…_matrix`` methods of :class:`.LearnerInspector`, minimizing the distance
between adjacent leaves
The old sorting behaviour of FACET 1 can be restored using method
:meth:`.LinkageTree.sort_by_weight`
row and column order of :class:`~pytools.data.Matrix` objects returned by the
corresponding ``feature_…_matrix`` methods of :class:`.LearnerInspector`, minimizing
the distance between adjacent leaves.
The old sorting behaviour of FACET |nbsp| 1.x can be restored using method
:meth:`~pytools.data.LinkageTree.sort_by_weight`

``facet.selection``
^^^^^^^^^^^^^^^^^^^

- API: :class:`.ModelSelector` replaces FACET 1 class ``LearnerRanker``, and now
supports any CV searcher that supports `scikit-learn`'s CV search API, including
`scikit-learn`'s native searchers such as :class:`.GridSearchCV` or
:class:`.RandomizedSearchCV`
- API: new classes :class:`.ParameterSpace` and :class:`MultiParameterSpace` offer an
a more convenient and robust mechanism for declaring options or distributions for
hyperparameter tuning
- API: :class:`.LearnerSelector` replaces FACET |nbsp| 1.x class ``LearnerRanker``, and
now supports any CV searcher that supports `scikit-learn`'s CV search API, including
`scikit-learn`'s native searchers such as
:class:`~sklearn.model_selection.GridSearchCV` or
:class:`~sklearn.model_selection.RandomizedSearchCV`
- API: new classes :class:`.ParameterSpace` and :class:`.MultiEstimatorParameterSpace`
offer a more convenient and robust mechanism for declaring options or distributions
for hyperparameter tuning

``facet.simulation``
^^^^^^^^^^^^^^^^^^^^
Expand All @@ -63,18 +68,19 @@ visualizations, and is now fully type-checked by |mypy|.
``facet.validation``
^^^^^^^^^^^^^^^^^^^^

- API: remove class ``FullSampleValidator``
- API: removed class ``FullSampleValidator``

Other
^^^^^

- API: class ``LearnerCrossfit`` is no longer needed in FACET 2.0 and has been removed
- API: class ``LearnerCrossfit`` is no longer needed in FACET |nbsp| 2.0 and has been
removed


FACET 1.2
---------

FACET 1.2 adds support for *sklearndf* 1.2 and *scikit-learn* 0.24.
FACET |nbsp| 1.2 adds support for *sklearndf* |nbsp| 1.2 and *scikit-learn* |nbsp| 0.24.
It also introduces the ability to run simulations on a subsample of the data used to
fit the underlying crossfit.
One example where this can be useful is to use only a recent period of a time series as
Expand All @@ -84,21 +90,21 @@ the baseline of a simulation.
1.2.2
~~~~~

- catch up with FACET 1.1.2
- catch up with FACET |nbsp| 1.1.2


1.2.1
~~~~~

- FIX: fix a bug in :class:`.UnivariateProbabilitySimulator` that was introduced in
FACET 1.2.0
- catch up with FACET 1.1.1
FACET |nbsp| 1.2.0
- catch up with FACET |nbsp| 1.1.1


1.2.0
~~~~~

- BUILD: added support for *sklearndf* 1.2 and *scikit-learn* 0.24
- BUILD: added support for *sklearndf* |nbsp| 1.2 and *scikit-learn* |nbsp| 0.24
- API: new optional parameter ``subsample`` in method
:meth:`.BaseUnivariateSimulator.simulate_feature` can be used to specify a subsample
to be used in the simulation (but simulating using a crossfit based on the full
Expand All @@ -108,18 +114,20 @@ the baseline of a simulation.
FACET 1.1
---------

FACET 1.1 refines and enhances the association/synergy/redundancy calculations provided
by the :class:`.LearnerInspector`.
FACET |nbsp| 1.1 refines and enhances the association/synergy/redundancy calculations
provided by the :class:`.LearnerInspector`.


1.1.2
~~~~~

- DOC: use a downloadable dataset in the `getting started` notebook
- FIX: import :mod:`catboost` if present, else create a local module mockup
- FIX: import `catboost <https://catboost.ai/en/docs/>`_ if present, else create a local
module mockup
- FIX: correctly identify if ``sample_weights`` is undefined when re-fitting a model
on the full dataset in a :class:`.LearnerCrossfit`
- BUILD: relax package dependencies to support any `numpy` version 1.`x` from 1.16
on the full dataset in a ``LearnerCrossfit``
- BUILD: relax package dependencies to support any `numpy` version |nbsp| 1.`x` from
|nbsp| 1.16


1.1.1
Expand All @@ -143,9 +151,9 @@ by the :class:`.LearnerInspector`.
across matrices as an indication of confidence for each calculated value.
- API: Method :meth:`.LearnerInspector.shap_plot_data` now returns SHAP values for the
positive class of binary classifiers.
- API: Increase efficiency of :class:`.ModelSelector` parallelization by adopting the
- API: Increase efficiency of ``ModelSelector`` parallelization by adopting the
new :class:`pytools.parallelization.JobRunner` API provided by :mod:`pytools`
- BUILD: add support for :mod:`shap` 0.38 and 0.39
- BUILD: add support for :mod:`shap` |nbsp| 0.38 and |nbsp| 0.39


FACET 1.0
Expand All @@ -154,19 +162,20 @@ FACET 1.0
1.0.3
~~~~~

- FIX: restrict package requirements to *gamma-pytools* 1.0.* and *sklearndf* 1.0.x,
since FACET 1.0 is not compatible with *gamma-pytools* 1.1.*
- FIX: restrict package requirements to *gamma-pytools* |nbsp| 1.0.* and
*sklearndf* |nbsp| 1.0.x, since FACET |nbsp| 1.0 is not compatible with
*gamma-pytools* |nbsp| 1.1.*

1.0.2
~~~~~

This is a maintenance release focusing on enhancements to the CI/CD pipeline and bug
fixes.

- API: add support for :mod:`shap` 0.36 and 0.37 via a new :class:`.BaseExplainer`
stub class
- API: add support for |shap| |nbsp| 0.36 and |nbsp| 0.37 via a new
:class:`.BaseExplainer` stub class
- FIX: apply color scheme to the histogram section in :class:`.SimulationMatplotStyle`
- BUILD: add support for :mod:`numpy` 1.20
- BUILD: add support for :mod:`numpy` |nbsp| 1.20
- BUILD: updates and changes to the CI/CD pipeline


Expand Down
4 changes: 3 additions & 1 deletion environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -12,7 +12,7 @@ dependencies:
- numpy ~= 1.22
- pandas ~= 1.4
- python ~= 3.9
- scikit-learn ~= 1.0.2
- scikit-learn ~= 1.1
- scipy ~= 1.8
- shap ~= 0.41
- sklearndf ~= 2.0
Expand All @@ -38,6 +38,8 @@ dependencies:
- sphinx-autodoc-typehints ~= 1.19
- pydata-sphinx-theme ~= 0.8.1
# notebooks
- ipywidgets ~= 8.0
- jupyterlab ~= 3.2
- openpyxl ~= 3.0
- seaborn ~= 0.11
- tableone ~= 0.7
8 changes: 4 additions & 4 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -74,15 +74,15 @@ no-binary.min = ["matplotlib", "shap"]

[build.matrix.min]
# direct requirements of gamma-facet
gamma-pytools = "~=2.0.2"
gamma-pytools = "~=2.0.4"
matplotlib = "~=3.0.3"
numpy = "==1.21.6" # cannot use ~= due to conda bug
packaging = "~=20.9"
pandas = "~=1.0.5"
python = ">=3.7.12,<3.8a" # cannot use ~= due to conda bug
scipy = "~=1.4.1"
shap = "~=0.34.0"
sklearndf = "~=2.0.0"
sklearndf = "~=2.0.1"
# additional minimum requirements of sklearndf
boruta = "~=0.3.0"
lightgbm = "~=3.0.0"
Expand All @@ -105,11 +105,11 @@ pandas = "~=1.4"
python = ">=3.9,<4a" # cannot use ~= due to conda bug
scipy = "~=1.8"
shap = "~=0.41"
sklearndf = "~=2.0"
sklearndf = "~=2.1"
# additional maximum requirements of sklearndf
boruta = "~=0.3"
lightgbm = "~=3.3"
scikit-learn = "~=1.0.2"
scikit-learn = "~=1.1"
xgboost = "~=1.5"
# additional maximum requirements of gamma-pytools
joblib = "~=1.1"
Expand Down
14 changes: 7 additions & 7 deletions sphinx/auxiliary/Diabetes_getting_started_example.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -4,7 +4,7 @@
"cell_type": "markdown",
"metadata": {},
"source": [
"<img src=\"../source/_static/Gamma_Facet_Logo_RGB_LB.svg\" width=\"500\" style=\"padding-bottom: 70px; padding-top: 70px; margin: auto; display: block\">"
"<img src=\"../source/_images/Gamma_Facet_Logo_RGB_LB.svg\" width=\"500\" style=\"padding-bottom: 70px; padding-top: 70px; margin: auto; display: block\">"
]
},
{
Expand Down Expand Up @@ -71,7 +71,7 @@
"To demonstrate the model inspection capability of FACET, we first create a pipeline to fit a learner. In this simple example we use the [diabetes dataset](https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data) which contains age, sex, BMI and blood pressure along with 6 blood serum measurements as features. This dataset was used in this\n",
"[publication](https://statweb.stanford.edu/~tibs/ftp/lars.pdf). A transformed version of this dataset is also available on scikit-learn [here](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset).\n",
"\n",
"In this quickstart we will train a Random Forest regressor using 10 repeated 5-fold CV to predict disease progression after one year. With the use of *sklearndf* we can create a *pandas* DataFrame compatible workflow. However, FACET provides additional enhancements to keep track of our feature matrix and target vector using a sample object (`Sample`) and easily compare hyperparameter configurations and even multiple learners with the `ModelSelector`."
"In this quickstart we will train a Random Forest regressor using 10 repeated 5-fold CV to predict disease progression after one year. With the use of *sklearndf* we can create a *pandas* DataFrame compatible workflow. However, FACET provides additional enhancements to keep track of our feature matrix and target vector using a sample object (`Sample`) and easily compare hyperparameter configurations and even multiple learners with the `LearnerSelector`."
]
},
{
Expand Down Expand Up @@ -274,7 +274,7 @@
],
"source": [
"# rank your candidate models by performance\n",
"selector = ModelSelector(\n",
"selector = LearnerSelector(\n",
" searcher_type=GridSearchCV,\n",
" parameter_space=rnd_forest_ps, \n",
" cv=rkf_cv, \n",
Expand Down Expand Up @@ -399,7 +399,7 @@
"# save copy of plot to _static directory for documentation\n",
"MatrixDrawer(style=\"matplot%\").draw(synergy_matrix, title=\"Synergy Matrix\")\n",
"plt.savefig(\n",
" \"../source/_static/synergy_matrix.png\", bbox_inches=\"tight\", pad_inches=0\n",
" \"../source/_images/synergy_matrix.png\", bbox_inches=\"tight\", pad_inches=0\n",
")"
]
},
Expand Down Expand Up @@ -456,7 +456,7 @@
"# save copy of plot to _static directory for documentation\n",
"MatrixDrawer(style=\"matplot%\").draw(redundancy_matrix, title=\"Redundancy Matrix\")\n",
"plt.savefig(\n",
" \"../source/_static/redundancy_matrix.png\",\n",
" \"../source/_images/redundancy_matrix.png\",\n",
" bbox_inches=\"tight\",\n",
" pad_inches=0,\n",
")"
Expand Down Expand Up @@ -525,7 +525,7 @@
"\n",
"# save copy of plot to _static directories for documentation\n",
"plt.savefig(\n",
" \"../source/_static/redundancy_dendrogram.png\",\n",
" \"../source/_images/redundancy_dendrogram.png\",\n",
" bbox_inches=\"tight\",\n",
" pad_inches=0,\n",
")"
Expand Down Expand Up @@ -608,7 +608,7 @@
"\n",
"# save copy of plot to _static directory for documentation\n",
"plt.savefig(\n",
" \"../source/_static/simulation_output.png\",\n",
" \"../source/_images/simulation_output.png\",\n",
" bbox_inches=\"tight\",\n",
" pad_inches=0,\n",
")"
Expand Down
Loading

0 comments on commit 1bdc49b

Please sign in to comment.