Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BUILD: release FACET 2.0rc0 #349

Merged
merged 37 commits into from
Sep 19, 2022
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
37 commits
Select commit Hold shift + click to select a range
182261f
Revert version to 1.2.2, yet to be released
j-ittner Apr 25, 2022
978bfe4
BUILD: pin click at version 8.0.*
j-ittner Apr 25, 2022
f61ed19
Merge branch '1.1.x' into 1.2.x
j-ittner Apr 25, 2022
6cae388
BUILD: pin typing-extensions to <4.2
j-ittner Apr 25, 2022
8e64e0b
BUILD: pin typing-extensions to <4.2
j-ittner Apr 25, 2022
ef2c918
Merge pull request #338 from BCG-Gamma/dev/1.2.2
j-ittner Apr 25, 2022
14f53f6
BUILD: update version to 1.2.3
j-ittner Apr 25, 2022
63a0ea5
Merge branch '1.2.x' into 2.0.x
j-ittner Apr 25, 2022
cbdef1b
clean up after merge
j-ittner Apr 25, 2022
0182c8d
Merge remote-tracking branch 'origin/release/2.0.dev1' into 2.0.x
j-ittner Apr 25, 2022
5c832f8
BUILD: update version to 2.0.dev2
j-ittner Apr 26, 2022
812e09e
BUILD: update flake8 and mypy to latest versions in pre-commit config…
j-ittner May 23, 2022
b1b9844
BUILD: use pytools 2.0
j-ittner Jun 14, 2022
52e80c4
IDE: ignore .mypy_cache folder
j-ittner Jun 14, 2022
68fc041
TEST: tidy up unit tests (#340)
j-ittner Jun 23, 2022
8357c0b
BUILD: update build config to support updates to pytools make.py
j-ittner Jun 29, 2022
dcaed31
BUILD: update azure-pipelines.yml in line with pytools pipeline
j-ittner Jun 29, 2022
b19ebed
BUILD: update pytest to ~=7.1
j-ittner Jul 15, 2022
99c3df4
BUILD: use stable pytools ~=2.0,>=2.0.1
j-ittner Jul 15, 2022
15488ec
BUILD: update package versions for code linters
j-ittner Jul 20, 2022
487b13d
BUILD: update package dependencies
j-ittner Aug 22, 2022
c9f4dd7
REFACTOR: support mypy checks in 'strict' mode (#346)
j-ittner Aug 28, 2022
e2e5cf5
API: raise exception if the name of any Sample column is not a string…
j-ittner Aug 28, 2022
f14982a
validation (#347)
j-ittner Aug 29, 2022
690134a
DOC: enable local sphinx builds (#345)
j-ittner Sep 8, 2022
819abe7
BUILD: relax python dependency to ~=3.8
j-ittner Sep 8, 2022
a140cd4
BUILD: add pre-commit ~=2.20 to environment.yml
j-ittner Sep 8, 2022
bf747bc
BUILD: require python ~=3.9 in 'max' matrix builds
j-ittner Sep 8, 2022
eeaf70d
BUILD: require python ~=3.9 in environment.yml
j-ittner Sep 8, 2022
cd53881
BUILD: use python 3.9 with the mypy pre-commit hook
j-ittner Sep 8, 2022
7ad28b9
DOC: update docs and tutorials to 2.0 (#321)
mtsokol Sep 13, 2022
f675646
BUILD: update code quality checker versions
j-ittner Sep 14, 2022
5c687eb
FIX: re-add required dependencies for mypy check
j-ittner Sep 14, 2022
dba2d2c
BUILD: update version to 2.0rc0
j-ittner Sep 19, 2022
1bdc49b
API: Rename `ModelSelector` to `LearnerSelector` (#348)
mtsokol Sep 19, 2022
9cc2881
BUILD: update azire pipeline
j-ittner Sep 19, 2022
fd016f6
DOC: tweak docstrings
j-ittner Sep 19, 2022
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -414,3 +414,4 @@ TSWLatexianTemp*

# exclude notebooks directory: this is generated during build
/notebooks/
/sphinx/base/
4 changes: 4 additions & 0 deletions .idea/facet.iml

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

30 changes: 19 additions & 11 deletions .pre-commit-config.yaml
Original file line number Diff line number Diff line change
@@ -1,38 +1,46 @@
repos:
- repo: https://github.com/PyCQA/isort
rev: 5.5.4
rev: 5.10.1
hooks:
- id: isort

- repo: https://github.com/psf/black
rev: 22.3.0
rev: 22.8.0
hooks:
- id: black
language_version: python3
language: python_venv
language_version: python39

- repo: https://gitlab.com/pycqa/flake8
rev: 3.9.0
rev: 5.0.4
hooks:
- id: flake8
name: flake8
entry: flake8 --config tox.ini
language: python_venv
additional_dependencies: [ flake8-comprehensions, flake8-import-order ]
language_version: python39
additional_dependencies:
- flake8-comprehensions ~= 3.10
types: [ python ]

- repo: https://github.com/pre-commit/pre-commit-hooks
rev: v3.2.0
rev: v4.3.0
hooks:
- id: check-added-large-files
- id: check-json
- id: check-xml
- id: check-yaml
language: python_venv
exclude: condabuild/meta.yaml

- repo: https://github.com/pre-commit/mirrors-mypy
rev: v0.931
rev: v0.971
hooks:
- id: mypy
files: src/
files: src|sphinx|test
language: python_venv
language_version: python39
additional_dependencies:
- numpy>=1.22
- gamma-pytools>=2.0.dev8,<3a
- sklearndf>=2.0.dev3,<3a
- numpy~=1.22
- gamma-pytools~=2.0,!=2.0.0
- sklearndf~=2.0
81 changes: 36 additions & 45 deletions README.rst
Original file line number Diff line number Diff line change
@@ -1,4 +1,4 @@
.. image:: sphinx/source/_static/Gamma_Facet_Logo_RGB_LB.svg
.. image:: sphinx/source/_images/Gamma_Facet_Logo_RGB_LB.svg

|

Expand Down Expand Up @@ -103,21 +103,21 @@ In this quickstart we will train a Random Forest regressor using 10 repeated
*sklearndf* we can create a *pandas* DataFrame compatible workflow. However,
FACET provides additional enhancements to keep track of our feature matrix
and target vector using a sample object (`Sample`) and easily compare
hyperparameter configurations and even multiple learners with the `LearnerRanker`.
hyperparameter configurations and even multiple learners with the `LearnerSelector`.

.. code-block:: Python

# standard imports
import pandas as pd
from sklearn.model_selection import RepeatedKFold
from sklearn.model_selection import RepeatedKFold, GridSearchCV

# some helpful imports from sklearndf
from sklearndf.pipeline import RegressorPipelineDF
from sklearndf.regression import RandomForestRegressorDF

# relevant FACET imports
from facet.data import Sample
from facet.selection import LearnerRanker, LearnerGrid
from facet.selection import LearnerSelector, ParameterSpace

# declaring url with data
data_url = 'https://web.stanford.edu/~hastie/Papers/LARS/diabetes.data'
Expand All @@ -144,29 +144,27 @@ hyperparameter configurations and even multiple learners with the `LearnerRanker
regressor=RandomForestRegressorDF(n_estimators=200, random_state=42)
)

# define grid of models which are "competing" against each other
rnd_forest_grid = [
LearnerGrid(
pipeline=rnd_forest_reg,
learner_parameters={
"min_samples_leaf": [8, 11, 15],
"max_depth": [4, 5, 6],
}
),
]
# define parameter space for models which are "competing" against each other
rnd_forest_ps = ParameterSpace(rnd_forest_reg)
rnd_forest_ps.regressor.min_samples_leaf = [8, 11, 15]
rnd_forest_ps.regressor.max_depth = [4, 5, 6]

# create repeated k-fold CV iterator
rkf_cv = RepeatedKFold(n_splits=5, n_repeats=10, random_state=42)

# rank your candidate models by performance (default is mean CV score - 2*SD)
ranker = LearnerRanker(
grids=rnd_forest_grid, cv=rkf_cv, n_jobs=-3
# rank your candidate models by performance
selector = LearnerSelector(
searcher_type=GridSearchCV,
parameter_space=rnd_forest_ps,
cv=rkf_cv,
n_jobs=-3,
scoring="r2"
).fit(sample=diabetes_sample)

# get summary report
ranker.summary_report()
selector.summary_report()

.. image:: sphinx/source/_static/ranker_summary.png
.. image:: sphinx/source/_images/ranker_summary.png
:width: 600

We can see based on this minimal workflow that a value of 11 for minimum
Expand Down Expand Up @@ -233,8 +231,10 @@ The key global metrics for each pair of features in a model are:

# fit the model inspector
from facet.inspection import LearnerInspector
inspector = LearnerInspector(n_jobs=-3)
inspector.fit(crossfit=ranker.best_model_crossfit_)
inspector = LearnerInspector(
pipeline=selector.best_estimator_,
n_jobs=-3
).fit(sample=diabetes_sample)

**Synergy**

Expand All @@ -245,7 +245,7 @@ The key global metrics for each pair of features in a model are:
synergy_matrix = inspector.feature_synergy_matrix()
MatrixDrawer(style="matplot%").draw(synergy_matrix, title="Synergy Matrix")

.. image:: sphinx/source/_static/synergy_matrix.png
.. image:: sphinx/source/_images/synergy_matrix.png
:width: 600

For any feature pair (A, B), the first feature (A) is the row, and the second
Expand Down Expand Up @@ -273,7 +273,7 @@ to 27% synergy of `LDL` with `LTG` for predicting progression after one year.
redundancy_matrix = inspector.feature_redundancy_matrix()
MatrixDrawer(style="matplot%").draw(redundancy_matrix, title="Redundancy Matrix")

.. image:: sphinx/source/_static/redundancy_matrix.png
.. image:: sphinx/source/_images/redundancy_matrix.png
:width: 600


Expand Down Expand Up @@ -312,7 +312,7 @@ Let's look at the example for redundancy.
redundancy = inspector.feature_redundancy_linkage()
DendrogramDrawer().draw(data=redundancy, title="Redundancy Dendrogram")

.. image:: sphinx/source/_static/redundancy_dendrogram.png
.. image:: sphinx/source/_images/redundancy_dendrogram.png
:width: 600

Based on the dendrogram we can see that the feature pairs (`LDL`, `TC`)
Expand All @@ -337,39 +337,30 @@ we do the following for the simulation:
of that partition.
- For each partition, the simulator creates an artificial copy of the original sample
assuming the variable to be simulated has the same value across all observations –
which is the value representing the partition. Using the best `LearnerCrossfit`
acquired from the ranker, the simulator now re-predicts all targets using the models
trained for all folds and determines the average uplift of the target variable
which is the value representing the partition. Using the best estimator
acquired from the selector, the simulator now re-predicts all targets using the models
trained for full sample and determines the uplift of the target variable
resulting from this.
- The FACET `SimulationDrawer` allows us to visualise the result; both in a
*matplotlib* and a plain-text style.

Finally, because FACET can use bootstrap cross validation, we can create a crossfit
from our previous `LearnerRanker` best model to perform the simulation, so we can
quantify the uncertainty by using bootstrap confidence intervals.

.. code-block:: Python

# FACET imports
from facet.validation import BootstrapCV
from facet.crossfit import LearnerCrossfit
from facet.simulation import UnivariateUpliftSimulator
from facet.data.partition import ContinuousRangePartitioner
from facet.simulation.viz import SimulationDrawer

# create bootstrap CV iterator
bscv = BootstrapCV(n_splits=1000, random_state=42)

# create a bootstrap CV crossfit for simulation using best model
boot_crossfit = LearnerCrossfit(
pipeline=ranker.best_model_,
cv=bscv,
n_jobs=-3,
verbose=False,
).fit(sample=diabetes_sample)

SIM_FEAT = "BMI"
simulator = UnivariateUpliftSimulator(crossfit=boot_crossfit, n_jobs=-3)
simulator = UnivariateUpliftSimulator(
model=selector.best_estimator_,
sample=diabetes_sample,
n_jobs=-3
)

# split the simulation range into equal sized partitions
partitioner = ContinuousRangePartitioner()
Expand All @@ -380,7 +371,7 @@ quantify the uncertainty by using bootstrap confidence intervals.
# visualise results
SimulationDrawer().draw(data=simulation, title=SIM_FEAT)

.. image:: sphinx/source/_static/simulation_output.png
.. image:: sphinx/source/_images/simulation_output.png

We would conclude from the figure that higher values of `BMI` are associated with
an increase in disease progression after one year, and that for a `BMI` of 28
Expand Down Expand Up @@ -436,15 +427,15 @@ BCG GAMMA team. If you would like to know more you can find out about
or have a look at
`career opportunities <https://www.bcg.com/en-gb/beyond-consulting/bcg-gamma/careers>`_.

.. |pipe| image:: sphinx/source/_static/icons/pipe_icon.png
.. |pipe| image:: sphinx/source/_images/icons/pipe_icon.png
:width: 100px
:class: facet_icon

.. |inspect| image:: sphinx/source/_static/icons/inspect_icon.png
.. |inspect| image:: sphinx/source/_images/icons/inspect_icon.png
:width: 100px
:class: facet_icon

.. |sim| image:: sphinx/source/_static/icons/sim_icon.png
.. |sim| image:: sphinx/source/_images/icons/sim_icon.png
:width: 100px
:class: facet_icon

Expand Down
Loading