From 54ed72c998f11cbb15f2df51667e41936e160257 Mon Sep 17 00:00:00 2001 From: Jason Date: Tue, 29 Sep 2020 13:47:27 +0100 Subject: [PATCH] This commit makes revisions and minor corrections to NB tutorial template (#77) --- .../Facet_sphinx_tutorial_template.ipynb | 112 ++++++++++++++++-- 1 file changed, 103 insertions(+), 9 deletions(-) diff --git a/sphinx/auxiliary/Facet_sphinx_tutorial_template.ipynb b/sphinx/auxiliary/Facet_sphinx_tutorial_template.ipynb index 58cf4f9f0..1a01504c7 100644 --- a/sphinx/auxiliary/Facet_sphinx_tutorial_template.ipynb +++ b/sphinx/auxiliary/Facet_sphinx_tutorial_template.ipynb @@ -33,11 +33,11 @@ "\n", "**Robust and impactful Data Science with FACET**\n", "\n", - "FACET enables us to perform a number of critical steps in best practice Data Science work flow easily, efficiently and reproducibly:\n", + "FACET enables us to perform several critical steps in best practice Data Science work flow easily, efficiently and reproducibly:\n", "\n", - "1. Create a robust pipeline for learner selection using LearnerRanker and enabling the use of bootstrap cross-validation.\n", + "1. Create a robust pipeline for learner selection using LearnerRanker and cross-validation.\n", "\n", - "2. Enhance our model inspection to understand drivers of predictions using local explanations of features via SHAP values by applying a novel methodology that decomposes SHAP values into measures of synergy, redundancy, and independence between each pair of features.\n", + "2. Enhance our model inspection to understand drivers of predictions using local explanations of features via [SHAP values](http://papers.nips.cc/paper/7062-a-unified-approach-to-interpreting-model-predictions) by applying a novel methodology that decomposes SHAP values into measures of synergy, redundancy, and independence between each pair of features.\n", "\n", "3. Quickly apply historical simulation to gain key insights into feature values that minimize or maximize the predicted outcome.\n", "\n", @@ -58,12 +58,106 @@ "\n", "*The tutorial should have the following structure, and should always link to the same heading in the notebook:*\n", "\n", - "1. [*Your sections*](#Your-sections) \n", + "1. [Required imports](#Required-imports)\n", + "2. [*Your sections*](#Your-sections) \n", "... *more of your sections*\n", "\n", - "2. [Summary](#Summary)\n", - "3. [What can you do next?](#What-can-you-do-next?)\n", - "4. [Appendix](#Appendix)" + "3. [Summary](#Summary)\n", + "4. [What can you do next?](#What-can-you-do-next?)\n", + "5. [Appendix](#Appendix)" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Required imports" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "In order to run this notebook, we will import not only the FACET package, but also other packages useful to solve this task. Overall, we can break down the imports into three categories: \n", + "\n", + "1. Common packages (pandas, matplotlib, etc.)\n", + "2. Required FACET classes (inspection, selection, validation, simulation, etc.)\n", + "3. Other BCG Gamma packages which simplify pipelining (sklearndf, see on [GitHub](https://github.com/orgs/BCG-Gamma/sklearndf/)) and support visualization (pytools, see on [GitHub](https://github.com/BCG-Gamma/pytools)) when using FACET" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Common package imports**" + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [], + "source": [ + "# list your usual imports here such as pandas, numpy and others \n", + "# not covered by FACET, sklearndf or pytools" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**Gamma FACET imports**" + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [], + "source": [ + "# list your Gamma Facet imports here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**sklearndf imports**\n", + "\n", + "Instead of using the \"regular\" scikit-learn package, we are going to use sklearndf (see on [GitHub](https://github.com/orgs/BCG-Gamma/sklearndf/)). sklearndf is an open source library designed to address a common issue with scikit-learn: the outputs of transformers are numpy arrays, even when the input is a data frame. However, to inspect a model it is essential to keep track of the feature names. sklearndf retains all the functionality available through scikit-learn plus the feature traceability and usability associated with Pandas DataFrames. Additionally, the names of all your favourite scikit-learn functions are the same except for `DF` on the end. For example, the standard scikit-learn import:\n", + "\n", + "`from sklearn.pipeline import Pipeline`\n", + "\n", + "becomes:\n", + "\n", + "`from sklearndf.pipeline import PipelineDF`" + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [], + "source": [ + "# list your sklearndf imports here" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "**pytools imports**\n", + "\n", + "pytools (see on [GitHub](https://github.com/BCG-Gamma/pytools)) is an open source library containing general machine learning and visualisation utilities, some of which are useful for visualising the advanced model inspection capabilities of FACET." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [], + "source": [ + "# list your pytools imports here" ] }, { @@ -73,7 +167,7 @@ "# *Your sections*\n", "\n", "1. *Your text providing an overview of this section.*\n", - "2. *Use as many sections as you need to provide a good high level structure to the tutorial, such as preprocessing, classifier development or model inspection*" + "2. *Use as many sections as you need to provide a good high-level structure to the tutorial, such as preprocessing, classifier development or model inspection*" ] }, { @@ -119,7 +213,7 @@ "source": [ "## Data source and study cohort\n", "\n", - "*This section should include all information relevant to obtaining the data used in the tutorial and reproducing the starting point for the tutorial. This could includes things like:* \n", + "*This section should include all information relevant to obtaining the data used in the tutorial and reproducing the starting point for the tutorial. This could include things like:* \n", "\n", "1. *Detailed listings of data sources, including links and how to access* \n", "2. *How the study population are defined, whether it be all transactions over a certain value or all patients above a certain age* \n",