|
31 | 31 | "\n",
|
32 | 32 | "**Context**\n",
|
33 | 33 | "\n",
|
34 |
| - "With the advanced capabilities FACET provides by extending SHAP-based model inspection, it is important to gain some intuition for how the newly introduced measures for feature redundancy and synergy can vary. As SHAP values represent post-processing after data preparation, feature engineering, preprocessing and model selection/tuning, minimal simulation studies offer a way to make the connection as direct as possible." |
| 34 | + "With the advaced capabilities FACET provides by extending SHAP-based model inspection, it is important to gain some intution for how the newly introduced measures for feature redundancy and synergy can vary. As SHAP values represent post-processing after data preparation, feature engineering, preprocessing and model selection/tuning, minimal simulation studies offer a way to make the connection as direct as possible." |
35 | 35 | ]
|
36 | 36 | },
|
37 | 37 | {
|
|
47 | 47 | "\n",
|
48 | 48 | "**Tutorial outline**\n",
|
49 | 49 | "\n",
|
50 |
| - "1. [Required imports](#Required-imports)\n", |
51 |
| - "2. [Redundancy, Synergy and SHAP](#Redundancy,-Synergy-and-SHAP)\n", |
52 |
| - "3. [Data simulation](#Data-simulation)\n", |
53 |
| - "4. [How redundancy and synergy change with feature correlation and interaction](#How-redundancy-and-synergy-change-with-feature-correlation-and-interaction)\n", |
54 |
| - "5. [How overfitting affects the accuracy of redundancy and synergy estimates](#How-overfitting-affects-the-accuracy-of-redundancy-and-synergy-estimates)\n", |
55 |
| - "6. [Summary](#Summary)\n", |
56 |
| - "7. [What can you do next?](#What-can-you-do-next?)\n", |
57 |
| - "8. [Appendix](#Appendix)" |
58 |
| - ] |
59 |
| - }, |
60 |
| - { |
61 |
| - "cell_type": "markdown", |
62 |
| - "metadata": {}, |
63 |
| - "source": [ |
64 |
| - "# Required imports" |
65 |
| - ] |
66 |
| - }, |
67 |
| - { |
68 |
| - "cell_type": "markdown", |
69 |
| - "metadata": {}, |
70 |
| - "source": [ |
71 |
| - "In order to run this notebook, we will import not only the FACET package, but also other packages useful to solve this task. Overall, we can break down the imports into three categories: \n", |
72 |
| - "\n", |
73 |
| - "1. Common packages (pandas, matplotlib, etc.)\n", |
74 |
| - "2. Required FACET classes (inpsection, selection, validation, simulation, etc.)\n", |
75 |
| - "3. Other BCG Gamma packages which simplify pipelining (`sklearndf`) and support visualization (`pytools`) when using FACET" |
76 |
| - ] |
77 |
| - }, |
78 |
| - { |
79 |
| - "cell_type": "markdown", |
80 |
| - "metadata": {}, |
81 |
| - "source": [ |
82 |
| - "**Common package imports**" |
| 50 | + "1. [Redundancy and Synergy](#Redundancy-and-Synergy)\n", |
| 51 | + "2. [Data simulation](#Data-simulation)\n", |
| 52 | + "3. [How redundancy and synergy change with feature correlation and interaction](#How-redundancy-and-synergy-change-with-feature-correlation-and-interaction)\n", |
| 53 | + "4. [How overfitting affects the accuracy of redundancy and synergy estimates](#How-overfitting-affects-the-accuracy-of-redundancy-and-synergy-estimates)\n", |
| 54 | + "5. [Summary](#Summary)\n", |
| 55 | + "6. [What can you do next?](#What-can-you-do-next?)\n", |
| 56 | + "7. [Appendix](#Appendix)" |
83 | 57 | ]
|
84 | 58 | },
|
85 | 59 | {
|
86 | 60 | "cell_type": "code",
|
87 |
| - "execution_count": 6, |
| 61 | + "execution_count": 4, |
88 | 62 | "metadata": {},
|
89 | 63 | "outputs": [],
|
90 | 64 | "source": [
|
|
96 | 70 | "import shap\n",
|
97 | 71 | "import itertools\n",
|
98 | 72 | "import seaborn as sns\n",
|
99 |
| - "from sklearn.model_selection import RepeatedKFold" |
100 |
| - ] |
101 |
| - }, |
102 |
| - { |
103 |
| - "cell_type": "markdown", |
104 |
| - "metadata": {}, |
105 |
| - "source": [ |
106 |
| - "**Gamma FACET imports**" |
107 |
| - ] |
108 |
| - }, |
109 |
| - { |
110 |
| - "cell_type": "code", |
111 |
| - "execution_count": 5, |
112 |
| - "metadata": {}, |
113 |
| - "outputs": [], |
114 |
| - "source": [ |
| 73 | + "from sklearn.model_selection import RepeatedKFold\n", |
| 74 | + "\n", |
115 | 75 | "# FACET imports\n",
|
116 | 76 | "from facet import Sample\n",
|
117 | 77 | "from facet.crossfit import LearnerCrossfit\n",
|
118 | 78 | "from facet.inspection import LearnerInspector\n",
|
119 | 79 | "from facet.selection import LearnerRanker, LearnerGrid\n",
|
120 |
| - "from facet.validation import BootstrapCV" |
121 |
| - ] |
122 |
| - }, |
123 |
| - { |
124 |
| - "cell_type": "markdown", |
125 |
| - "metadata": {}, |
126 |
| - "source": [ |
127 |
| - "**sklearndf imports**\n", |
| 80 | + "from facet.validation import BootstrapCV\n", |
128 | 81 | "\n",
|
129 |
| - "Instead of using the \"regular\" scikit-learn package, we are going to use `sklearndf` (see on [GitHub](https://github.com/orgs/BCG-Gamma/sklearndf/)). `sklearndf` is an open source library designed to address a common issue with scikit-learn: the outputs of transformers are numpy arrays, even when the input is a data frame. However, to inspect a model it is essential to keep track of the feature names. `sklearndf` retains all the functionality available through scikit-learn plus the feature traceability and usability associated with Pandas DataFrames. Additionally, the names of all your favourite scikit-learn functions are the same except for `DF` on the end. For example, the standard scikit-learn import:\n", |
130 |
| - "\n", |
131 |
| - "`from sklearn.pipeline import Pipeline`\n", |
132 |
| - "\n", |
133 |
| - "becomes:\n", |
134 |
| - "\n", |
135 |
| - "`from sklearndf.pipeline import PipelineDF`" |
136 |
| - ] |
137 |
| - }, |
138 |
| - { |
139 |
| - "cell_type": "code", |
140 |
| - "execution_count": 7, |
141 |
| - "metadata": {}, |
142 |
| - "outputs": [], |
143 |
| - "source": [ |
144 | 82 | "# sklearndf imports\n",
|
145 | 83 | "from sklearndf.pipeline import PipelineDF, ClassifierPipelineDF\n",
|
146 | 84 | "from sklearndf.classification import RandomForestClassifierDF"
|
147 | 85 | ]
|
148 | 86 | },
|
149 | 87 | {
|
150 | 88 | "cell_type": "markdown",
|
151 |
| - "metadata": { |
152 |
| - "heading_collapsed": true |
153 |
| - }, |
| 89 | + "metadata": {}, |
154 | 90 | "source": [
|
155 | 91 | "# Redundancy, Synergy and SHAP\n",
|
156 | 92 | "\n",
|
|
711 | 647 | "toc": {
|
712 | 648 | "base_numbering": 1,
|
713 | 649 | "nav_menu": {},
|
714 |
| - "number_sections": false, |
| 650 | + "number_sections": true, |
715 | 651 | "sideBar": true,
|
716 | 652 | "skip_h1_title": false,
|
717 | 653 | "title_cell": "Table of Contents",
|
|
0 commit comments