diff --git a/examples/algorithm-stability-yaml/analyze-results.ipynb b/examples/algorithm-stability-yaml/analyze-results.ipynb new file mode 100644 index 0000000..e7b9c4e --- /dev/null +++ b/examples/algorithm-stability-yaml/analyze-results.ipynb @@ -0,0 +1,1476 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Hi! Let's analyze the results of the experiment you just ran. To recap:\n", + "\n", + "1. You just ran something similar to:\n", + "\n", + " `python benchmark.py --multirun ranker=\"glob(*)\" +callbacks.to_sql.url=\"sqlite:///$HOME/results.sqlite\"`\n", + "2. There now should exist a `.sqlite` file at this path: `$HOME/results.sqlite`:\n", + "\n", + " ```\n", + " $ ls -al $HOME/results.sqlite\n", + " -rw-r--r-- 1 vscode vscode 20480 Sep 21 08:16 /home/vscode/results.sqlite\n", + " ```\n", + "\n", + "Let's now analyze the results! 📈" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "First, we will install `plotly-express`, so we can make nice plots later." + ] + }, + { + "cell_type": "code", + "execution_count": 1, + "metadata": {}, + "outputs": [ + { + "name": "stdout", + "output_type": "stream", + "text": [ + "Note: you may need to restart the kernel to use updated packages.\n" + ] + } + ], + "source": [ + "%pip install plotly-express nbconvert --quiet" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Figure out the SQL connection URI." + ] + }, + { + "cell_type": "code", + "execution_count": 2, + "metadata": {}, + "outputs": [ + { + "data": { + "text/plain": [ + "'sqlite:////home/vscode/results.sqlite'" + ] + }, + "execution_count": 2, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import os\n", + "\n", + "con: str = \"sqlite:///\" + os.environ[\"HOME\"] + \"/results.sqlite\"\n", + "con" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Read in the `experiments` table. This table contains metadata for all 'experiments' that have been run." + ] + }, + { + "cell_type": "code", + "execution_count": 3, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
datasetdataset/ndataset/pdataset/taskdataset/groupdataset/domainrankervalidatorlocal_dirdate_created
id
38vqcwusSynclf hard1000050classificationSynclfsyntheticBorutak-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:22:28.965510
y6bb1hccSynclf hard100050classificationSynclfsyntheticBorutak-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:22:53.609396
3vtr13pgSynclf hard100050classificationSynclfsyntheticReliefFk-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:25:09.974370
\n", + "
" + ], + "text/plain": [ + " dataset dataset/n dataset/p dataset/task dataset/group \\\n", + "id \n", + "38vqcwus Synclf hard 10000 50 classification Synclf \n", + "y6bb1hcc Synclf hard 1000 50 classification Synclf \n", + "3vtr13pg Synclf hard 1000 50 classification Synclf \n", + "\n", + " dataset/domain ranker validator \\\n", + "id \n", + "38vqcwus synthetic Boruta k-NN \n", + "y6bb1hcc synthetic Boruta k-NN \n", + "3vtr13pg synthetic ReliefF k-NN \n", + "\n", + " local_dir \\\n", + "id \n", + "38vqcwus /workspaces/fseval/examples/algorithm-stabilit... \n", + "y6bb1hcc /workspaces/fseval/examples/algorithm-stabilit... \n", + "3vtr13pg /workspaces/fseval/examples/algorithm-stabilit... \n", + "\n", + " date_created \n", + "id \n", + "38vqcwus 2022-09-21 08:22:28.965510 \n", + "y6bb1hcc 2022-09-21 08:22:53.609396 \n", + "3vtr13pg 2022-09-21 08:25:09.974370 " + ] + }, + "execution_count": 3, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "import pandas as pd\n", + "\n", + "experiments: pd.DataFrame = pd.read_sql_table(\"experiments\", con=con, index_col=\"id\")\n", + "experiments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "That's looking good 🙌🏻.\n", + "\n", + "Now, let's read in the `stability` table. We put data in this table by using our custom-made metric, defined in the `StabilityNogueira` class in `benchmark.py`. There, we push data to this table using `callbacks.on_table`." + ] + }, + { + "cell_type": "code", + "execution_count": 4, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexstability
id
y6bb1hcc00.933546
3vtr13pg01.000000
\n", + "
" + ], + "text/plain": [ + " index stability\n", + "id \n", + "y6bb1hcc 0 0.933546\n", + "3vtr13pg 0 1.000000" + ] + }, + "execution_count": 4, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "stability: pd.DataFrame = pd.read_sql_table(\"stability\", con=con, index_col=\"id\")\n", + "stability" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Cool. Now let's join the experiments with their actual metrics." + ] + }, + { + "cell_type": "code", + "execution_count": 5, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + "
\n", + "\n", + "\n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + " \n", + "
indexstabilitydatasetdataset/ndataset/pdataset/taskdataset/groupdataset/domainrankervalidatorlocal_dirdate_created
id
y6bb1hcc00.933546Synclf hard100050classificationSynclfsyntheticBorutak-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:22:53.609396
3vtr13pg01.000000Synclf hard100050classificationSynclfsyntheticReliefFk-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:25:09.974370
\n", + "
" + ], + "text/plain": [ + " index stability dataset dataset/n dataset/p dataset/task \\\n", + "id \n", + "y6bb1hcc 0 0.933546 Synclf hard 1000 50 classification \n", + "3vtr13pg 0 1.000000 Synclf hard 1000 50 classification \n", + "\n", + " dataset/group dataset/domain ranker validator \\\n", + "id \n", + "y6bb1hcc Synclf synthetic Boruta k-NN \n", + "3vtr13pg Synclf synthetic ReliefF k-NN \n", + "\n", + " local_dir \\\n", + "id \n", + "y6bb1hcc /workspaces/fseval/examples/algorithm-stabilit... \n", + "3vtr13pg /workspaces/fseval/examples/algorithm-stabilit... \n", + "\n", + " date_created \n", + "id \n", + "y6bb1hcc 2022-09-21 08:22:53.609396 \n", + "3vtr13pg 2022-09-21 08:25:09.974370 " + ] + }, + "execution_count": 5, + "metadata": {}, + "output_type": "execute_result" + } + ], + "source": [ + "stability_experiments = stability.join(experiments)\n", + "stability_experiments" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "Finally, we can plot the results so we can get a better grasp of what's going on:" + ] + }, + { + "cell_type": "code", + "execution_count": 6, + "metadata": {}, + "outputs": [ + { + "data": { + "text/html": [ + " \n", + " " + ] + }, + "metadata": {}, + "output_type": "display_data" + }, + { + "data": { + "application/vnd.plotly.v1+json": { + "config": { + "plotlyServerURL": "https://plot.ly" + }, + "data": [ + { + "alignmentgroup": "True", + "hovertemplate": "ranker=%{x}
stability=%{y}", + "legendgroup": "", + "marker": { + "color": "#636efa", + "pattern": { + "shape": "" + } + }, + "name": "", + "offsetgroup": "", + "orientation": "v", + "showlegend": false, + "textposition": "auto", + "type": "bar", + "x": [ + "Boruta", + "ReliefF" + ], + "xaxis": "x", + "y": [ + 0.9335459861775651, + 1 + ], + "yaxis": "y" + } + ], + "layout": { + "barmode": "relative", + "legend": { + "tracegroupgap": 0 + }, + "margin": { + "t": 60 + }, + "template": { + "data": { + "bar": [ + { + "error_x": { + "color": "#2a3f5f" + }, + "error_y": { + "color": "#2a3f5f" + }, + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "bar" + } + ], + "barpolar": [ + { + "marker": { + "line": { + "color": "#E5ECF6", + "width": 0.5 + }, + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "barpolar" + } + ], + "carpet": [ + { + "aaxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "baxis": { + "endlinecolor": "#2a3f5f", + "gridcolor": "white", + "linecolor": "white", + "minorgridcolor": "white", + "startlinecolor": "#2a3f5f" + }, + "type": "carpet" + } + ], + "choropleth": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "choropleth" + } + ], + "contour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "contour" + } + ], + "contourcarpet": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "contourcarpet" + } + ], + "heatmap": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmap" + } + ], + "heatmapgl": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "heatmapgl" + } + ], + "histogram": [ + { + "marker": { + "pattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + } + }, + "type": "histogram" + } + ], + "histogram2d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2d" + } + ], + "histogram2dcontour": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "histogram2dcontour" + } + ], + "mesh3d": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "type": "mesh3d" + } + ], + "parcoords": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "parcoords" + } + ], + "pie": [ + { + "automargin": true, + "type": "pie" + } + ], + "scatter": [ + { + "fillpattern": { + "fillmode": "overlay", + "size": 10, + "solidity": 0.2 + }, + "type": "scatter" + } + ], + "scatter3d": [ + { + "line": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatter3d" + } + ], + "scattercarpet": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattercarpet" + } + ], + "scattergeo": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergeo" + } + ], + "scattergl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattergl" + } + ], + "scattermapbox": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scattermapbox" + } + ], + "scatterpolar": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolar" + } + ], + "scatterpolargl": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterpolargl" + } + ], + "scatterternary": [ + { + "marker": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "type": "scatterternary" + } + ], + "surface": [ + { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + }, + "colorscale": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "type": "surface" + } + ], + "table": [ + { + "cells": { + "fill": { + "color": "#EBF0F8" + }, + "line": { + "color": "white" + } + }, + "header": { + "fill": { + "color": "#C8D4E3" + }, + "line": { + "color": "white" + } + }, + "type": "table" + } + ] + }, + "layout": { + "annotationdefaults": { + "arrowcolor": "#2a3f5f", + "arrowhead": 0, + "arrowwidth": 1 + }, + "autotypenumbers": "strict", + "coloraxis": { + "colorbar": { + "outlinewidth": 0, + "ticks": "" + } + }, + "colorscale": { + "diverging": [ + [ + 0, + "#8e0152" + ], + [ + 0.1, + "#c51b7d" + ], + [ + 0.2, + "#de77ae" + ], + [ + 0.3, + "#f1b6da" + ], + [ + 0.4, + "#fde0ef" + ], + [ + 0.5, + "#f7f7f7" + ], + [ + 0.6, + "#e6f5d0" + ], + [ + 0.7, + "#b8e186" + ], + [ + 0.8, + "#7fbc41" + ], + [ + 0.9, + "#4d9221" + ], + [ + 1, + "#276419" + ] + ], + "sequential": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ], + "sequentialminus": [ + [ + 0, + "#0d0887" + ], + [ + 0.1111111111111111, + "#46039f" + ], + [ + 0.2222222222222222, + "#7201a8" + ], + [ + 0.3333333333333333, + "#9c179e" + ], + [ + 0.4444444444444444, + "#bd3786" + ], + [ + 0.5555555555555556, + "#d8576b" + ], + [ + 0.6666666666666666, + "#ed7953" + ], + [ + 0.7777777777777778, + "#fb9f3a" + ], + [ + 0.8888888888888888, + "#fdca26" + ], + [ + 1, + "#f0f921" + ] + ] + }, + "colorway": [ + "#636efa", + "#EF553B", + "#00cc96", + "#ab63fa", + "#FFA15A", + "#19d3f3", + "#FF6692", + "#B6E880", + "#FF97FF", + "#FECB52" + ], + "font": { + "color": "#2a3f5f" + }, + "geo": { + "bgcolor": "white", + "lakecolor": "white", + "landcolor": "#E5ECF6", + "showlakes": true, + "showland": true, + "subunitcolor": "white" + }, + "hoverlabel": { + "align": "left" + }, + "hovermode": "closest", + "mapbox": { + "style": "light" + }, + "paper_bgcolor": "white", + "plot_bgcolor": "#E5ECF6", + "polar": { + "angularaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "radialaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "scene": { + "xaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "yaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + }, + "zaxis": { + "backgroundcolor": "#E5ECF6", + "gridcolor": "white", + "gridwidth": 2, + "linecolor": "white", + "showbackground": true, + "ticks": "", + "zerolinecolor": "white" + } + }, + "shapedefaults": { + "line": { + "color": "#2a3f5f" + } + }, + "ternary": { + "aaxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "baxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + }, + "bgcolor": "#E5ECF6", + "caxis": { + "gridcolor": "white", + "linecolor": "white", + "ticks": "" + } + }, + "title": { + "x": 0.05 + }, + "xaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + }, + "yaxis": { + "automargin": true, + "gridcolor": "white", + "linecolor": "white", + "ticks": "", + "title": { + "standoff": 15 + }, + "zerolinecolor": "white", + "zerolinewidth": 2 + } + } + }, + "xaxis": { + "anchor": "y", + "domain": [ + 0, + 1 + ], + "title": { + "text": "ranker" + } + }, + "yaxis": { + "anchor": "x", + "domain": [ + 0, + 1 + ], + "title": { + "text": "stability" + } + } + } + }, + "text/html": [ + "
" + ] + }, + "metadata": {}, + "output_type": "display_data" + } + ], + "source": [ + "import plotly.express as px\n", + "\n", + "px.bar(stability_experiments,\n", + " x=\"ranker\",\n", + " y=\"stability\"\n", + ")" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "We can now observe that for Boruta and ReliefF, ReliefF is the most 'stable' given this dataset, getting 100% the same features for all 10 bootstraps that were run." + ] + } + ], + "metadata": { + "kernelspec": { + "display_name": "Python 3.9.12 64-bit", + "language": "python", + "name": "python3" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.9.12" + }, + "orig_nbformat": 4, + "vscode": { + "interpreter": { + "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1" + } + } + }, + "nbformat": 4, + "nbformat_minor": 2 +} diff --git a/examples/algorithm-stability-yaml/benchmark.py b/examples/algorithm-stability-yaml/benchmark.py new file mode 100644 index 0000000..099c905 --- /dev/null +++ b/examples/algorithm-stability-yaml/benchmark.py @@ -0,0 +1,111 @@ +from typing import Dict, Optional, Union + +import hydra +import numpy as np +import pandas as pd +from skrebate import ReliefF + +from fseval.config import PipelineConfig +from fseval.main import run_pipeline +from fseval.types import AbstractEstimator, AbstractMetric, Callback + +""" +The checkInputType and getStability functions come from the following paper: + +[1] On the Stability of Feature Selection. Sarah Nogueira, Konstantinos Sechidis, Gavin Brown. + Journal of Machine Learning Reasearch (JMLR). 2017. +You can find a full demo using this package at: +http://htmlpreview.github.io/?https://github.com/nogueirs/JMLR2017/blob/master/python/stabilityDemo.html +NB: This package requires the installation of the packages: numpy, scipy and math +""" + + +def checkInputType(Z): + """This function checks that Z is of the rigt type and dimension. + It raises an exception if not. + OUTPUT: The input Z as a numpy.ndarray + """ + ### We check that Z is a list or a numpy.array + if isinstance(Z, list): + Z = np.asarray(Z) + elif not isinstance(Z, np.ndarray): + raise ValueError("The input matrix Z should be of type list or numpy.ndarray") + ### We check if Z is a matrix (2 dimensions) + if Z.ndim != 2: + raise ValueError("The input matrix Z should be of dimension 2") + return Z + + +def getStability(Z): + """ + Let us assume we have M>1 feature sets and d>0 features in total. + This function computes the stability estimate as given in Definition 4 in [1]. + + INPUT: A BINARY matrix Z (given as a list or as a numpy.ndarray of size M*d). + Each row of the binary matrix represents a feature set, where a 1 at the f^th position + means the f^th feature has been selected and a 0 means it has not been selected. + + OUTPUT: The stability of the feature selection procedure + """ + Z = checkInputType(Z) + M, d = Z.shape + hatPF = np.mean(Z, axis=0) + kbar = np.sum(hatPF) + denom = (kbar / d) * (1 - kbar / d) + return 1 - (M / (M - 1)) * np.mean(np.multiply(hatPF, 1 - hatPF)) / denom + + +class StabilityNogueira(AbstractMetric): + def score_bootstrap( + self, + ranker: AbstractEstimator, + validator: AbstractEstimator, + callbacks: Callback, + scores: Dict, + **kwargs, + ) -> Dict: + # compute stability and send to table + Z = np.array(self.support_matrix) + Z = Z.astype(int) + stability = getStability(Z) + stability_df = pd.DataFrame([{"stability": stability}]) + callbacks.on_table(stability_df, "stability") + + # set in scores dict + scores["stability"] = stability + + return scores + + def score_ranking( + self, + scores: Union[Dict, pd.DataFrame], + ranker: AbstractEstimator, + bootstrap_state: int, + callbacks: Callback, + feature_importances: Optional[np.ndarray] = None, + ): + support_matrix = getattr(self, "support_matrix", []) + self.support_matrix = support_matrix + self.support_matrix.append(ranker.feature_support_) + + +class ReliefF_FeatureSelection(ReliefF): + def fit(self, X, y): + super(ReliefF_FeatureSelection, self).fit(X, y) + + # extract feature subset from ReliefF + feature_subset = self.top_features_[: self.n_features_to_select] + + # set `support_` vector + _, p = np.shape(X) + self.support_ = np.zeros(p, dtype=bool) + self.support_[feature_subset] = True + + +@hydra.main(config_path="conf", config_name="my_config") +def main(cfg: PipelineConfig) -> None: + run_pipeline(cfg) + + +if __name__ == "__main__": + main() diff --git a/examples/algorithm-stability-yaml/conf/dataset/synclf_hard.yaml b/examples/algorithm-stability-yaml/conf/dataset/synclf_hard.yaml new file mode 100644 index 0000000..325d064 --- /dev/null +++ b/examples/algorithm-stability-yaml/conf/dataset/synclf_hard.yaml @@ -0,0 +1,18 @@ +name: Synclf hard +task: classification +domain: synthetic +group: Synclf +adapter: + _target_: sklearn.datasets.make_classification + class_sep: 0.8 + n_classes: 3 + n_clusters_per_class: 3 + n_features: 50 + n_informative: 4 + n_redundant: 0 + n_repeated: 0 + n_samples: 1000 + random_state: 0 + shuffle: false +feature_importances: + X[:, 0:4]: 1.0 diff --git a/examples/algorithm-stability-yaml/conf/metrics/stability_nogueira.yaml b/examples/algorithm-stability-yaml/conf/metrics/stability_nogueira.yaml new file mode 100644 index 0000000..5915374 --- /dev/null +++ b/examples/algorithm-stability-yaml/conf/metrics/stability_nogueira.yaml @@ -0,0 +1,3 @@ +# @package metrics +ranking_scores: + _target_: benchmark.StabilityNogueira diff --git a/examples/algorithm-stability-yaml/conf/my_config.yaml b/examples/algorithm-stability-yaml/conf/my_config.yaml new file mode 100644 index 0000000..39d9662 --- /dev/null +++ b/examples/algorithm-stability-yaml/conf/my_config.yaml @@ -0,0 +1,11 @@ +defaults: + - base_pipeline_config + - _self_ + - override dataset: synclf_hard + - override validator: knn + - override /callbacks: + - to_sql + - override /metrics: + - stability_nogueira + +n_bootstraps: 10 diff --git a/examples/algorithm-stability-yaml/conf/ranker/boruta.yaml b/examples/algorithm-stability-yaml/conf/ranker/boruta.yaml new file mode 100644 index 0000000..ef7e0e4 --- /dev/null +++ b/examples/algorithm-stability-yaml/conf/ranker/boruta.yaml @@ -0,0 +1,11 @@ +name: Boruta +estimator: + _target_: boruta.boruta_py.BorutaPy + estimator: + _target_: sklearn.ensemble.RandomForestClassifier + n_estimators: auto +_estimator_type: classifier +multioutput: false +estimates_feature_importances: false +estimates_feature_support: true +estimates_feature_ranking: true \ No newline at end of file diff --git a/examples/algorithm-stability-yaml/conf/ranker/relieff.yaml b/examples/algorithm-stability-yaml/conf/ranker/relieff.yaml new file mode 100644 index 0000000..3b36a4c --- /dev/null +++ b/examples/algorithm-stability-yaml/conf/ranker/relieff.yaml @@ -0,0 +1,7 @@ +name: ReliefF +estimator: + _target_: benchmark.ReliefF_FeatureSelection + n_features_to_select: 10 # select best 10 features in feature subset. +_estimator_type: classifier +estimates_feature_importances: true +estimates_feature_support: true diff --git a/examples/algorithm-stability-yaml/conf/validator/knn.yaml b/examples/algorithm-stability-yaml/conf/validator/knn.yaml new file mode 100644 index 0000000..7a3b4c5 --- /dev/null +++ b/examples/algorithm-stability-yaml/conf/validator/knn.yaml @@ -0,0 +1,6 @@ +name: k-NN +estimator: + _target_: sklearn.neighbors.KNeighborsClassifier +_estimator_type: classifier +multioutput: false +estimates_target: true diff --git a/fseval/callbacks/to_sql.py b/fseval/callbacks/to_sql.py index 4c4f410..3d88cbe 100644 --- a/fseval/callbacks/to_sql.py +++ b/fseval/callbacks/to_sql.py @@ -3,11 +3,11 @@ from typing import Dict import pandas as pd -from fseval.config.callbacks.to_sql import ToSQLCallback -from fseval.types import TerminalColor from omegaconf import MISSING, DictConfig from sqlalchemy import create_engine -from sqlalchemy.pool import NullPool + +from fseval.config.callbacks.to_sql import ToSQLCallback +from fseval.types import TerminalColor from ._base_export_callback import BaseExportCallback diff --git a/fseval/pipelines/_experiment.py b/fseval/pipelines/_experiment.py index 14c7e70..06170a8 100644 --- a/fseval/pipelines/_experiment.py +++ b/fseval/pipelines/_experiment.py @@ -7,10 +7,10 @@ import numpy as np import pandas as pd +from humanfriendly import format_timespan + from fseval.pipeline.estimator import Estimator from fseval.types import AbstractEstimator, Callback, TerminalColor -from humanfriendly import format_timespan -from sqlalchemy.engine import Engine @dataclass diff --git a/fseval/pipelines/rank_and_validate/_support_validator.py b/fseval/pipelines/rank_and_validate/_support_validator.py index 9acda9f..f4db467 100644 --- a/fseval/pipelines/rank_and_validate/_support_validator.py +++ b/fseval/pipelines/rank_and_validate/_support_validator.py @@ -63,10 +63,11 @@ def score(self, X, y, **kwargs) -> Union[Dict, pd.DataFrame, np.generic, None]: scores = pd.DataFrame([scores_dict]) # add custom metrics + X_, y_ = self._prepare_data(X, y) + for metric_name, metric_class in self.metrics.items(): - X, y = self._prepare_data(X, y) scores_metric = metric_class.score_support( # type: ignore - scores, self.validator, X, y, self.callbacks + scores, self.validator, X_, y_, self.callbacks ) # type: ignore if scores_metric is not None: diff --git a/tests/integration/test_main.py b/tests/integration/test_main.py index 81d64c8..bbaf72d 100644 --- a/tests/integration/test_main.py +++ b/tests/integration/test_main.py @@ -2,9 +2,10 @@ import tempfile import pytest +from hydra.conf import ConfigStore + from fseval.config import EstimatorConfig, PipelineConfig from fseval.main import run_pipeline -from fseval.types import IncompatibilityError from fseval.utils.hydra_utils import get_config from hydra.conf import ConfigStore from hydra.errors import InstantiationException diff --git a/website/docs/_recipes/algorithm-stability.md b/website/docs/_recipes/algorithm-stability.md deleted file mode 100644 index 9ac8290..0000000 --- a/website/docs/_recipes/algorithm-stability.md +++ /dev/null @@ -1 +0,0 @@ -# Analyze algorithm stability \ No newline at end of file diff --git a/website/docs/_recipes/running-on-aws.md b/website/docs/_recipes/running-on-aws.md deleted file mode 100644 index ab76c2f..0000000 --- a/website/docs/_recipes/running-on-aws.md +++ /dev/null @@ -1,3 +0,0 @@ -# Running on AWS - -evaluate! \ No newline at end of file diff --git a/website/docs/_recipes/running-on-slurm.md b/website/docs/_recipes/running-on-slurm.md deleted file mode 100644 index 7474275..0000000 --- a/website/docs/_recipes/running-on-slurm.md +++ /dev/null @@ -1 +0,0 @@ -# Running on a SLURM cluster \ No newline at end of file diff --git a/website/docs/quick-start.mdx b/website/docs/quick-start.mdx index bdb48da..ca1b704 100644 --- a/website/docs/quick-start.mdx +++ b/website/docs/quick-start.mdx @@ -144,6 +144,14 @@ We can now decide how to export the results. We can upload our results to a live sql_con=sqlite:////Users/dunnkers/Downloads/results.sqlite # any well-defined database URL ``` +:::note Relative vs absolute paths + +If you define a _relative_ database URL, like `sql_con=sqlite:///./results.sqlite`, the results will be saved right where Hydra stores its individual run files. In other words, multiple `.sqlite` files are stored in the `./multirun` subfolders. + +To prevent this, and store all results in 1 `.sqlite` file, use an **absolute** path, like above. But preferably, you are using a proper running database - see the recipes for more instructions on this. + +::: + We are now ready to run an experiment. In a terminal, `cd` into the unzipped example directory and run the following: ```shell python benchmark.py --multirun ranker='glob(*)' +callbacks.to_sql.url=$sql_con diff --git a/website/docs/_recipes/_category_.json b/website/docs/recipes/_category_.json similarity index 100% rename from website/docs/_recipes/_category_.json rename to website/docs/recipes/_category_.json diff --git a/website/docs/recipes/algorithm-stability.md b/website/docs/recipes/algorithm-stability.md new file mode 100644 index 0000000..3f9d42c --- /dev/null +++ b/website/docs/recipes/algorithm-stability.md @@ -0,0 +1,320 @@ +# Analyze algorithm stability + +For many applications, it is very important the algorithms that are used are **stable** enough. This means, that when a different sample of data is taken from some distribution, the results will turn out similar. This, combined with possible inherent stochastic properties of an algorithm, make up for the _stability_ of the algorithm. The same applies to Feature Selection or Feature Ranking algorithms. + +Therefore, let's do such an experiment! We are going to compare the stability of [ReliefF](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.56.4740&rep=rep1&type=pdf) to [Boruta](https://www.jstatsoft.org/article/view/v036i11), two popular feature selection algorithms. We are going to do this using a metric introduced in [Nogueira et al, 2018](https://www.jmlr.org/papers/volume18/17-514/17-514.pdf). + + +## The experiment + +We are going to run an experiment with the following [configuration](https://github.com/dunnkers/fseval/tree/master/examples/algorithm-stability-yaml/). + +Download the experiment config: [algorithm-stability-yaml.zip](pathname:///fseval/zipped-examples/algorithm-stability-yaml.zip) + +Most notably are the following configuration settings: + +```yaml title="my_config.yaml" +defaults: + - base_pipeline_config + - _self_ + - override dataset: synclf_hard + - override validator: knn + - override /callbacks: + - to_sql + // highlight-start + - override /metrics: + - stability_nogueira + // highlight-end + +// highlight-start +n_bootstraps: 10 +// highlight-end +``` + +That means, we are going to generate a synthetic dataset and sample 10 subsets from it. This is because `n_bootstraps=10`. Then, after the feature selection algorithm was executed and fitted on the dataset, a custom installed metric will be executed, called `stability_nogueira`. This can be found in the `/conf/metrics` folder, which in turn refers to a class in the `benchmark.py` file. + +To now run the experiment, run the following command inside the `algorithm-stability-yaml` folder: + +```shell +python benchmark.py --multirun ranker="glob(*)" +callbacks.to_sql.url="sqlite:///$HOME/results.sqlite" +``` + +## Analyzing the results + +### Recap + +Hi! Let's analyze the results of the experiment you just ran. To **recap**: + +1. You just ran something similar to: + + `python benchmark.py --multirun ranker="glob(*)" +callbacks.to_sql.url="sqlite:///$HOME/results.sqlite"` +2. There now should exist a `.sqlite` file at this path: `$HOME/results.sqlite`: + + ``` + $ ls -al $HOME/results.sqlite + -rw-r--r-- 1 vscode vscode 20480 Sep 21 08:16 /home/vscode/results.sqlite + ``` + +Let's now analyze the results! 📈 + +### Analysis + +> The rest of the text assumes all code was ran inside a Jupyter Notebook, in chronological order. The source Notebook can be found [here](https://github.com/dunnkers/fseval/tree/master/examples/algorithm-stability-yaml/analyze-results.ipynb) + +First, we will install `plotly-express`, so we can make nice plots later. + + +```python +%pip install plotly-express --quiet +``` + + +Figure out the SQL connection URI. + + +```python +import os + +con: str = "sqlite:///" + os.environ["HOME"] + "/results.sqlite" +con +``` + + + + + 'sqlite:////home/vscode/results.sqlite' + + + +Read in the `experiments` table. This table contains metadata for all 'experiments' that have been run. + + +```python +import pandas as pd + +experiments: pd.DataFrame = pd.read_sql_table("experiments", con=con, index_col="id") +experiments +``` + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
datasetdataset/ndataset/pdataset/taskdataset/groupdataset/domainrankervalidatorlocal_dirdate_created
id
38vqcwusSynclf hard1000050classificationSynclfsyntheticBorutak-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:22:28.965510
y6bb1hccSynclf hard100050classificationSynclfsyntheticBorutak-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:22:53.609396
3vtr13pgSynclf hard100050classificationSynclfsyntheticReliefFk-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:25:09.974370
+
+ + + +That's looking good 🙌🏻. + +Now, let's read in the `stability` table. We put data in this table by using our custom-made metric, defined in the `StabilityNogueira` class in `benchmark.py`. There, we push data to this table using `callbacks.on_table`. + + +```python +stability: pd.DataFrame = pd.read_sql_table("stability", con=con, index_col="id") +stability +``` + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + +
indexstability
id
y6bb1hcc00.933546
3vtr13pg01.000000
+
+ + + +Cool. Now let's join the experiments with their actual metrics. + + +```python +stability_experiments = stability.join(experiments) +stability_experiments +``` + + + + +
+ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
indexstabilitydatasetdataset/ndataset/pdataset/taskdataset/groupdataset/domainrankervalidatorlocal_dirdate_created
id
y6bb1hcc00.933546Synclf hard100050classificationSynclfsyntheticBorutak-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:22:53.609396
3vtr13pg01.000000Synclf hard100050classificationSynclfsyntheticReliefFk-NN/workspaces/fseval/examples/algorithm-stabilit...2022-09-21 08:25:09.974370
+
+ + + +Finally, we can plot the results so we can get a better grasp of what's going on: + + +```python +import plotly.express as px + +px.bar(stability_experiments, + x="ranker", + y="stability" +) +``` + + +![feature selectors algorithm stability](/img/recipes/feature-selectors-stability-barplot.png) + + +We can now observe that for Boruta and ReliefF, ReliefF is the most 'stable' given this dataset, getting 100% the same features for all 10 bootstraps that were run. diff --git a/website/static/img/recipes/feature-selectors-stability-barplot.png b/website/static/img/recipes/feature-selectors-stability-barplot.png new file mode 100644 index 0000000..41c70bf Binary files /dev/null and b/website/static/img/recipes/feature-selectors-stability-barplot.png differ diff --git a/website/static/zipped-examples/algorithm-stability-yaml.zip b/website/static/zipped-examples/algorithm-stability-yaml.zip new file mode 100644 index 0000000..9c83ba2 Binary files /dev/null and b/website/static/zipped-examples/algorithm-stability-yaml.zip differ