diff --git a/examples/comparing-feature-selectors/analyze-results.ipynb b/examples/comparing-feature-selectors/analyze-results.ipynb
new file mode 100644
index 0000000..a6a0e09
--- /dev/null
+++ b/examples/comparing-feature-selectors/analyze-results.ipynb
@@ -0,0 +1,2343 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## Comparing Feature Selectors\n",
+ "Hi! You want to compare the performance of multiple feature selectors? This is an example Notebook, showing you how to do such an analysis. "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "First of all, to recap:\n",
+ "\n",
+ "1. You just ran something similar to:\n",
+ "\n",
+ " `python benchmark.py --multirun ranker=\"glob(*)\" +callbacks.to_sql.url=\"sqlite:////tmp/results.sqlite\"`\n",
+ "2. There now should exist a `.sqlite` file at this path: `/tmp/results.sqlite`:\n",
+ "\n",
+ " ```\n",
+ " $ ls -al /tmp/results.sqlite\n",
+ " -rw-r--r-- 1 vscode vscode 20480 Sep 21 08:16 /tmp/results.sqlite\n",
+ " ```\n",
+ "\n",
+ "Let's now analyze the results! 📈"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We will install `plotly-express`, so we can make nice plots later."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m A new release of pip available: \u001b[0m\u001b[31;49m22.2.2\u001b[0m\u001b[39;49m -> \u001b[0m\u001b[32;49m22.3\u001b[0m\n",
+ "\u001b[1m[\u001b[0m\u001b[34;49mnotice\u001b[0m\u001b[1;39;49m]\u001b[0m\u001b[39;49m To update, run: \u001b[0m\u001b[32;49mpip install --upgrade pip\u001b[0m\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "%pip install plotly-express nbconvert --quiet"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Next, let's find a place to store our results to. In this case, we choose to store it in a local SQLite database, located at `/tmp/results.sqlite`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "'sqlite:////tmp/results.sqlite'"
+ ]
+ },
+ "execution_count": 14,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import os\n",
+ "\n",
+ "con: str = \"sqlite:////tmp/results.sqlite\"\n",
+ "con"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Now, we can read the `experiments` table."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " dataset | \n",
+ " dataset/n | \n",
+ " dataset/p | \n",
+ " dataset/task | \n",
+ " dataset/group | \n",
+ " dataset/domain | \n",
+ " ranker | \n",
+ " validator | \n",
+ " local_dir | \n",
+ " date_created | \n",
+ "
\n",
+ " \n",
+ " id | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 3lllxl48 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " ANOVA F-value | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:28:27.506838 | \n",
+ "
\n",
+ " \n",
+ " 1944ropg | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " Boruta | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:28:31.230633 | \n",
+ "
\n",
+ " \n",
+ " 31gd56gf | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " Chi-Squared | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:29:19.633012 | \n",
+ "
\n",
+ " \n",
+ " a8washm5 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " Decision Tree | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:29:23.459190 | \n",
+ "
\n",
+ " \n",
+ " 27i7uwg4 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " Infinite Selection | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:29:27.506974 | \n",
+ "
\n",
+ " \n",
+ " 3velt3b9 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " MultiSURF | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:29:31.758090 | \n",
+ "
\n",
+ " \n",
+ " 3fdrxlt6 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " Mutual Info | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:35:04.289361 | \n",
+ "
\n",
+ " \n",
+ " 14lecx0g | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " ReliefF | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:35:08.614262 | \n",
+ "
\n",
+ " \n",
+ " 3sggjvu3 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " Stability Selection | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:35:59.121416 | \n",
+ "
\n",
+ " \n",
+ " dtt8bvo5 | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " XGBoost | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:36:23.385401 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " dataset dataset/n dataset/p dataset/task \\\n",
+ "id \n",
+ "3lllxl48 My synthetic dataset 10000 20 classification \n",
+ "1944ropg My synthetic dataset 10000 20 classification \n",
+ "31gd56gf My synthetic dataset 10000 20 classification \n",
+ "a8washm5 My synthetic dataset 10000 20 classification \n",
+ "27i7uwg4 My synthetic dataset 10000 20 classification \n",
+ "3velt3b9 My synthetic dataset 10000 20 classification \n",
+ "3fdrxlt6 My synthetic dataset 10000 20 classification \n",
+ "14lecx0g My synthetic dataset 10000 20 classification \n",
+ "3sggjvu3 My synthetic dataset 10000 20 classification \n",
+ "dtt8bvo5 My synthetic dataset 10000 20 classification \n",
+ "\n",
+ " dataset/group dataset/domain ranker validator \\\n",
+ "id \n",
+ "3lllxl48 None None ANOVA F-value k-NN \n",
+ "1944ropg None None Boruta k-NN \n",
+ "31gd56gf None None Chi-Squared k-NN \n",
+ "a8washm5 None None Decision Tree k-NN \n",
+ "27i7uwg4 None None Infinite Selection k-NN \n",
+ "3velt3b9 None None MultiSURF k-NN \n",
+ "3fdrxlt6 None None Mutual Info k-NN \n",
+ "14lecx0g None None ReliefF k-NN \n",
+ "3sggjvu3 None None Stability Selection k-NN \n",
+ "dtt8bvo5 None None XGBoost k-NN \n",
+ "\n",
+ " local_dir \\\n",
+ "id \n",
+ "3lllxl48 /workspaces/fseval/examples/comparing-feature-... \n",
+ "1944ropg /workspaces/fseval/examples/comparing-feature-... \n",
+ "31gd56gf /workspaces/fseval/examples/comparing-feature-... \n",
+ "a8washm5 /workspaces/fseval/examples/comparing-feature-... \n",
+ "27i7uwg4 /workspaces/fseval/examples/comparing-feature-... \n",
+ "3velt3b9 /workspaces/fseval/examples/comparing-feature-... \n",
+ "3fdrxlt6 /workspaces/fseval/examples/comparing-feature-... \n",
+ "14lecx0g /workspaces/fseval/examples/comparing-feature-... \n",
+ "3sggjvu3 /workspaces/fseval/examples/comparing-feature-... \n",
+ "dtt8bvo5 /workspaces/fseval/examples/comparing-feature-... \n",
+ "\n",
+ " date_created \n",
+ "id \n",
+ "3lllxl48 2022-10-22 14:28:27.506838 \n",
+ "1944ropg 2022-10-22 14:28:31.230633 \n",
+ "31gd56gf 2022-10-22 14:29:19.633012 \n",
+ "a8washm5 2022-10-22 14:29:23.459190 \n",
+ "27i7uwg4 2022-10-22 14:29:27.506974 \n",
+ "3velt3b9 2022-10-22 14:29:31.758090 \n",
+ "3fdrxlt6 2022-10-22 14:35:04.289361 \n",
+ "14lecx0g 2022-10-22 14:35:08.614262 \n",
+ "3sggjvu3 2022-10-22 14:35:59.121416 \n",
+ "dtt8bvo5 2022-10-22 14:36:23.385401 "
+ ]
+ },
+ "execution_count": 15,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "\n",
+ "experiments: pd.DataFrame = pd.read_sql_table(\"experiments\", con=con, index_col=\"id\")\n",
+ "experiments"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Let's also read in the `validation_scores`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " index | \n",
+ " n_features_to_select | \n",
+ " fit_time | \n",
+ " score | \n",
+ " bootstrap_state | \n",
+ "
\n",
+ " \n",
+ " id | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 3lllxl48 | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 0.004433 | \n",
+ " 0.7955 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 3lllxl48 | \n",
+ " 0 | \n",
+ " 2 | \n",
+ " 0.004227 | \n",
+ " 0.7910 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 3lllxl48 | \n",
+ " 0 | \n",
+ " 3 | \n",
+ " 0.005183 | \n",
+ " 0.7950 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 3lllxl48 | \n",
+ " 0 | \n",
+ " 4 | \n",
+ " 0.003865 | \n",
+ " 0.7965 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " 3lllxl48 | \n",
+ " 0 | \n",
+ " 5 | \n",
+ " 0.002902 | \n",
+ " 0.7950 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ " ... | \n",
+ "
\n",
+ " \n",
+ " dtt8bvo5 | \n",
+ " 0 | \n",
+ " 16 | \n",
+ " 0.000670 | \n",
+ " 0.7805 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " dtt8bvo5 | \n",
+ " 0 | \n",
+ " 17 | \n",
+ " 0.000480 | \n",
+ " 0.7725 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " dtt8bvo5 | \n",
+ " 0 | \n",
+ " 18 | \n",
+ " 0.003159 | \n",
+ " 0.7760 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " dtt8bvo5 | \n",
+ " 0 | \n",
+ " 19 | \n",
+ " 0.000848 | \n",
+ " 0.7650 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ " dtt8bvo5 | \n",
+ " 0 | \n",
+ " 20 | \n",
+ " 0.000565 | \n",
+ " 0.7590 | \n",
+ " 1 | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
160 rows × 5 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " index n_features_to_select fit_time score bootstrap_state\n",
+ "id \n",
+ "3lllxl48 0 1 0.004433 0.7955 1\n",
+ "3lllxl48 0 2 0.004227 0.7910 1\n",
+ "3lllxl48 0 3 0.005183 0.7950 1\n",
+ "3lllxl48 0 4 0.003865 0.7965 1\n",
+ "3lllxl48 0 5 0.002902 0.7950 1\n",
+ "... ... ... ... ... ...\n",
+ "dtt8bvo5 0 16 0.000670 0.7805 1\n",
+ "dtt8bvo5 0 17 0.000480 0.7725 1\n",
+ "dtt8bvo5 0 18 0.003159 0.7760 1\n",
+ "dtt8bvo5 0 19 0.000848 0.7650 1\n",
+ "dtt8bvo5 0 20 0.000565 0.7590 1\n",
+ "\n",
+ "[160 rows x 5 columns]"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "validation_scores: pd.DataFrame = pd.read_sql_table(\"validation_scores\", con=con, index_col=\"id\")\n",
+ "validation_scores"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "We can now merge them. Notice that we set as the _index_ the experiment ID, so we can use `pd.DataFrame.join` to do this."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " dataset | \n",
+ " dataset/n | \n",
+ " dataset/p | \n",
+ " dataset/task | \n",
+ " dataset/group | \n",
+ " dataset/domain | \n",
+ " ranker | \n",
+ " validator | \n",
+ " local_dir | \n",
+ " date_created | \n",
+ " index | \n",
+ " n_features_to_select | \n",
+ " fit_time | \n",
+ " score | \n",
+ " bootstrap_state | \n",
+ "
\n",
+ " \n",
+ " id | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 14lecx0g | \n",
+ " My synthetic dataset | \n",
+ " 10000 | \n",
+ " 20 | \n",
+ " classification | \n",
+ " None | \n",
+ " None | \n",
+ " ReliefF | \n",
+ " k-NN | \n",
+ " /workspaces/fseval/examples/comparing-feature-... | \n",
+ " 2022-10-22 14:35:08.614262 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " dataset dataset/n dataset/p dataset/task \\\n",
+ "id \n",
+ "14lecx0g My synthetic dataset 10000 20 classification \n",
+ "\n",
+ " dataset/group dataset/domain ranker validator \\\n",
+ "id \n",
+ "14lecx0g None None ReliefF k-NN \n",
+ "\n",
+ " local_dir \\\n",
+ "id \n",
+ "14lecx0g /workspaces/fseval/examples/comparing-feature-... \n",
+ "\n",
+ " date_created index n_features_to_select fit_time \\\n",
+ "id \n",
+ "14lecx0g 2022-10-22 14:35:08.614262 NaN NaN NaN \n",
+ "\n",
+ " score bootstrap_state \n",
+ "id \n",
+ "14lecx0g NaN NaN "
+ ]
+ },
+ "execution_count": 17,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "validation_scores_with_experiment_info = experiments.join(\n",
+ " validation_scores\n",
+ ")\n",
+ "validation_scores_with_experiment_info.head(1)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Cool! That will be all the information that we need. Let's first create an overview for all the rankers we benchmarked."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " dataset/n | \n",
+ " dataset/p | \n",
+ " index | \n",
+ " n_features_to_select | \n",
+ " fit_time | \n",
+ " score | \n",
+ " bootstrap_state | \n",
+ "
\n",
+ " \n",
+ " ranker | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ " | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Infinite Selection | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.004600 | \n",
+ " 0.818925 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " XGBoost | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.002998 | \n",
+ " 0.818575 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " Decision Tree | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.002810 | \n",
+ " 0.817675 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " Stability Selection | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.002406 | \n",
+ " 0.803325 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " Chi-Squared | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.002548 | \n",
+ " 0.795975 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " ANOVA F-value | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.003745 | \n",
+ " 0.789275 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " Mutual Info | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.002314 | \n",
+ " 0.786475 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " Boruta | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " 0.0 | \n",
+ " 10.5 | \n",
+ " 0.002366 | \n",
+ " 0.518075 | \n",
+ " 1.0 | \n",
+ "
\n",
+ " \n",
+ " MultiSURF | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ " ReliefF | \n",
+ " 10000.0 | \n",
+ " 20.0 | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ " NaN | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " dataset/n dataset/p index n_features_to_select \\\n",
+ "ranker \n",
+ "Infinite Selection 10000.0 20.0 0.0 10.5 \n",
+ "XGBoost 10000.0 20.0 0.0 10.5 \n",
+ "Decision Tree 10000.0 20.0 0.0 10.5 \n",
+ "Stability Selection 10000.0 20.0 0.0 10.5 \n",
+ "Chi-Squared 10000.0 20.0 0.0 10.5 \n",
+ "ANOVA F-value 10000.0 20.0 0.0 10.5 \n",
+ "Mutual Info 10000.0 20.0 0.0 10.5 \n",
+ "Boruta 10000.0 20.0 0.0 10.5 \n",
+ "MultiSURF 10000.0 20.0 NaN NaN \n",
+ "ReliefF 10000.0 20.0 NaN NaN \n",
+ "\n",
+ " fit_time score bootstrap_state \n",
+ "ranker \n",
+ "Infinite Selection 0.004600 0.818925 1.0 \n",
+ "XGBoost 0.002998 0.818575 1.0 \n",
+ "Decision Tree 0.002810 0.817675 1.0 \n",
+ "Stability Selection 0.002406 0.803325 1.0 \n",
+ "Chi-Squared 0.002548 0.795975 1.0 \n",
+ "ANOVA F-value 0.003745 0.789275 1.0 \n",
+ "Mutual Info 0.002314 0.786475 1.0 \n",
+ "Boruta 0.002366 0.518075 1.0 \n",
+ "MultiSURF NaN NaN NaN \n",
+ "ReliefF NaN NaN NaN "
+ ]
+ },
+ "execution_count": 25,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "validation_scores_with_experiment_info \\\n",
+ " .groupby(\"ranker\") \\\n",
+ " .mean(numeric_only=True) \\\n",
+ " .sort_values(\"score\", ascending=False)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Already, we notice that MultiSURF and ReliefF are missing. This is because the experiments failed. That can happen in a big benchmark! We will ignore this for now and continue with the other Feature Selectors.\n",
+ "\n",
+ "👀 We can already observe, that the _average_ classification accuracy is the highest for Infinite Selection. Although it would be premature to say it is the best, this is an indication that it did will for this dataset.\n",
+ "\n",
+ "Let's plot the results _per_ `n_features_to_select`. Note, that `n_features_to_select` means a validation step was run using a feature subset of size `n_features_to_select`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "application/vnd.plotly.v1+json": {
+ "config": {
+ "plotlyServerURL": "https://plot.ly"
+ },
+ "data": [
+ {
+ "hovertemplate": "ranker=ReliefF
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "ReliefF",
+ "line": {
+ "color": "#636efa",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "ReliefF",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ null
+ ],
+ "xaxis": "x",
+ "y": [
+ null
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=Boruta
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "Boruta",
+ "line": {
+ "color": "#EF553B",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "Boruta",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.4975,
+ 0.505,
+ 0.508,
+ 0.4845,
+ 0.504,
+ 0.503,
+ 0.5155,
+ 0.5035,
+ 0.4955,
+ 0.4985,
+ 0.5155,
+ 0.497,
+ 0.5085,
+ 0.5055,
+ 0.5265,
+ 0.511,
+ 0.5085,
+ 0.511,
+ 0.504,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=Infinite Selection
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "Infinite Selection",
+ "line": {
+ "color": "#00cc96",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "Infinite Selection",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.7955,
+ 0.9075,
+ 0.892,
+ 0.882,
+ 0.876,
+ 0.8605,
+ 0.8365,
+ 0.8345,
+ 0.826,
+ 0.8165,
+ 0.8105,
+ 0.802,
+ 0.797,
+ 0.7955,
+ 0.795,
+ 0.773,
+ 0.776,
+ 0.776,
+ 0.7675,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=Chi-Squared
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "Chi-Squared",
+ "line": {
+ "color": "#ab63fa",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "Chi-Squared",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.7955,
+ 0.791,
+ 0.795,
+ 0.7965,
+ 0.795,
+ 0.78,
+ 0.8425,
+ 0.829,
+ 0.8255,
+ 0.814,
+ 0.806,
+ 0.804,
+ 0.799,
+ 0.794,
+ 0.7875,
+ 0.7765,
+ 0.783,
+ 0.778,
+ 0.7685,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=Mutual Info
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "Mutual Info",
+ "line": {
+ "color": "#FFA15A",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "Mutual Info",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.7955,
+ 0.798,
+ 0.7955,
+ 0.791,
+ 0.7895,
+ 0.798,
+ 0.786,
+ 0.7905,
+ 0.791,
+ 0.7925,
+ 0.782,
+ 0.782,
+ 0.7815,
+ 0.7815,
+ 0.8025,
+ 0.7905,
+ 0.7815,
+ 0.7755,
+ 0.766,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=ANOVA F-value
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "ANOVA F-value",
+ "line": {
+ "color": "#19d3f3",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "ANOVA F-value",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.7955,
+ 0.791,
+ 0.795,
+ 0.7965,
+ 0.795,
+ 0.78,
+ 0.7885,
+ 0.787,
+ 0.78,
+ 0.814,
+ 0.811,
+ 0.804,
+ 0.8015,
+ 0.794,
+ 0.7875,
+ 0.7765,
+ 0.783,
+ 0.778,
+ 0.7685,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=Stability Selection
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "Stability Selection",
+ "line": {
+ "color": "#FF6692",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "Stability Selection",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.491,
+ 0.9075,
+ 0.898,
+ 0.882,
+ 0.869,
+ 0.855,
+ 0.858,
+ 0.839,
+ 0.827,
+ 0.8235,
+ 0.8025,
+ 0.8015,
+ 0.7985,
+ 0.793,
+ 0.781,
+ 0.772,
+ 0.7675,
+ 0.7755,
+ 0.766,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=MultiSURF
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "MultiSURF",
+ "line": {
+ "color": "#B6E880",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "MultiSURF",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ null
+ ],
+ "xaxis": "x",
+ "y": [
+ null
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=Decision Tree
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "Decision Tree",
+ "line": {
+ "color": "#FF97FF",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "Decision Tree",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.7955,
+ 0.9075,
+ 0.892,
+ 0.888,
+ 0.8715,
+ 0.8575,
+ 0.85,
+ 0.826,
+ 0.8185,
+ 0.809,
+ 0.806,
+ 0.7965,
+ 0.786,
+ 0.7935,
+ 0.788,
+ 0.773,
+ 0.789,
+ 0.778,
+ 0.769,
+ 0.759
+ ],
+ "yaxis": "y"
+ },
+ {
+ "hovertemplate": "ranker=XGBoost
n_features_to_select=%{x}
score=%{y}",
+ "legendgroup": "XGBoost",
+ "line": {
+ "color": "#FECB52",
+ "dash": "solid"
+ },
+ "marker": {
+ "symbol": "circle"
+ },
+ "mode": "lines",
+ "name": "XGBoost",
+ "orientation": "v",
+ "showlegend": true,
+ "type": "scatter",
+ "x": [
+ 1,
+ 2,
+ 3,
+ 4,
+ 5,
+ 6,
+ 7,
+ 8,
+ 9,
+ 10,
+ 11,
+ 12,
+ 13,
+ 14,
+ 15,
+ 16,
+ 17,
+ 18,
+ 19,
+ 20
+ ],
+ "xaxis": "x",
+ "y": [
+ 0.7955,
+ 0.9075,
+ 0.8975,
+ 0.886,
+ 0.879,
+ 0.8535,
+ 0.8455,
+ 0.83,
+ 0.818,
+ 0.8155,
+ 0.8055,
+ 0.801,
+ 0.8,
+ 0.796,
+ 0.788,
+ 0.7805,
+ 0.7725,
+ 0.776,
+ 0.765,
+ 0.759
+ ],
+ "yaxis": "y"
+ }
+ ],
+ "layout": {
+ "legend": {
+ "title": {
+ "text": "ranker"
+ },
+ "tracegroupgap": 0
+ },
+ "margin": {
+ "t": 60
+ },
+ "template": {
+ "data": {
+ "bar": [
+ {
+ "error_x": {
+ "color": "#2a3f5f"
+ },
+ "error_y": {
+ "color": "#2a3f5f"
+ },
+ "marker": {
+ "line": {
+ "color": "#E5ECF6",
+ "width": 0.5
+ },
+ "pattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ }
+ },
+ "type": "bar"
+ }
+ ],
+ "barpolar": [
+ {
+ "marker": {
+ "line": {
+ "color": "#E5ECF6",
+ "width": 0.5
+ },
+ "pattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ }
+ },
+ "type": "barpolar"
+ }
+ ],
+ "carpet": [
+ {
+ "aaxis": {
+ "endlinecolor": "#2a3f5f",
+ "gridcolor": "white",
+ "linecolor": "white",
+ "minorgridcolor": "white",
+ "startlinecolor": "#2a3f5f"
+ },
+ "baxis": {
+ "endlinecolor": "#2a3f5f",
+ "gridcolor": "white",
+ "linecolor": "white",
+ "minorgridcolor": "white",
+ "startlinecolor": "#2a3f5f"
+ },
+ "type": "carpet"
+ }
+ ],
+ "choropleth": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "type": "choropleth"
+ }
+ ],
+ "contour": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "contour"
+ }
+ ],
+ "contourcarpet": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "type": "contourcarpet"
+ }
+ ],
+ "heatmap": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "heatmap"
+ }
+ ],
+ "heatmapgl": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "heatmapgl"
+ }
+ ],
+ "histogram": [
+ {
+ "marker": {
+ "pattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ }
+ },
+ "type": "histogram"
+ }
+ ],
+ "histogram2d": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "histogram2d"
+ }
+ ],
+ "histogram2dcontour": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "histogram2dcontour"
+ }
+ ],
+ "mesh3d": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "type": "mesh3d"
+ }
+ ],
+ "parcoords": [
+ {
+ "line": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "parcoords"
+ }
+ ],
+ "pie": [
+ {
+ "automargin": true,
+ "type": "pie"
+ }
+ ],
+ "scatter": [
+ {
+ "fillpattern": {
+ "fillmode": "overlay",
+ "size": 10,
+ "solidity": 0.2
+ },
+ "type": "scatter"
+ }
+ ],
+ "scatter3d": [
+ {
+ "line": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatter3d"
+ }
+ ],
+ "scattercarpet": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattercarpet"
+ }
+ ],
+ "scattergeo": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattergeo"
+ }
+ ],
+ "scattergl": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattergl"
+ }
+ ],
+ "scattermapbox": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scattermapbox"
+ }
+ ],
+ "scatterpolar": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatterpolar"
+ }
+ ],
+ "scatterpolargl": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatterpolargl"
+ }
+ ],
+ "scatterternary": [
+ {
+ "marker": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "type": "scatterternary"
+ }
+ ],
+ "surface": [
+ {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ },
+ "colorscale": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "type": "surface"
+ }
+ ],
+ "table": [
+ {
+ "cells": {
+ "fill": {
+ "color": "#EBF0F8"
+ },
+ "line": {
+ "color": "white"
+ }
+ },
+ "header": {
+ "fill": {
+ "color": "#C8D4E3"
+ },
+ "line": {
+ "color": "white"
+ }
+ },
+ "type": "table"
+ }
+ ]
+ },
+ "layout": {
+ "annotationdefaults": {
+ "arrowcolor": "#2a3f5f",
+ "arrowhead": 0,
+ "arrowwidth": 1
+ },
+ "autotypenumbers": "strict",
+ "coloraxis": {
+ "colorbar": {
+ "outlinewidth": 0,
+ "ticks": ""
+ }
+ },
+ "colorscale": {
+ "diverging": [
+ [
+ 0,
+ "#8e0152"
+ ],
+ [
+ 0.1,
+ "#c51b7d"
+ ],
+ [
+ 0.2,
+ "#de77ae"
+ ],
+ [
+ 0.3,
+ "#f1b6da"
+ ],
+ [
+ 0.4,
+ "#fde0ef"
+ ],
+ [
+ 0.5,
+ "#f7f7f7"
+ ],
+ [
+ 0.6,
+ "#e6f5d0"
+ ],
+ [
+ 0.7,
+ "#b8e186"
+ ],
+ [
+ 0.8,
+ "#7fbc41"
+ ],
+ [
+ 0.9,
+ "#4d9221"
+ ],
+ [
+ 1,
+ "#276419"
+ ]
+ ],
+ "sequential": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ],
+ "sequentialminus": [
+ [
+ 0,
+ "#0d0887"
+ ],
+ [
+ 0.1111111111111111,
+ "#46039f"
+ ],
+ [
+ 0.2222222222222222,
+ "#7201a8"
+ ],
+ [
+ 0.3333333333333333,
+ "#9c179e"
+ ],
+ [
+ 0.4444444444444444,
+ "#bd3786"
+ ],
+ [
+ 0.5555555555555556,
+ "#d8576b"
+ ],
+ [
+ 0.6666666666666666,
+ "#ed7953"
+ ],
+ [
+ 0.7777777777777778,
+ "#fb9f3a"
+ ],
+ [
+ 0.8888888888888888,
+ "#fdca26"
+ ],
+ [
+ 1,
+ "#f0f921"
+ ]
+ ]
+ },
+ "colorway": [
+ "#636efa",
+ "#EF553B",
+ "#00cc96",
+ "#ab63fa",
+ "#FFA15A",
+ "#19d3f3",
+ "#FF6692",
+ "#B6E880",
+ "#FF97FF",
+ "#FECB52"
+ ],
+ "font": {
+ "color": "#2a3f5f"
+ },
+ "geo": {
+ "bgcolor": "white",
+ "lakecolor": "white",
+ "landcolor": "#E5ECF6",
+ "showlakes": true,
+ "showland": true,
+ "subunitcolor": "white"
+ },
+ "hoverlabel": {
+ "align": "left"
+ },
+ "hovermode": "closest",
+ "mapbox": {
+ "style": "light"
+ },
+ "paper_bgcolor": "white",
+ "plot_bgcolor": "#E5ECF6",
+ "polar": {
+ "angularaxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ },
+ "bgcolor": "#E5ECF6",
+ "radialaxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ }
+ },
+ "scene": {
+ "xaxis": {
+ "backgroundcolor": "#E5ECF6",
+ "gridcolor": "white",
+ "gridwidth": 2,
+ "linecolor": "white",
+ "showbackground": true,
+ "ticks": "",
+ "zerolinecolor": "white"
+ },
+ "yaxis": {
+ "backgroundcolor": "#E5ECF6",
+ "gridcolor": "white",
+ "gridwidth": 2,
+ "linecolor": "white",
+ "showbackground": true,
+ "ticks": "",
+ "zerolinecolor": "white"
+ },
+ "zaxis": {
+ "backgroundcolor": "#E5ECF6",
+ "gridcolor": "white",
+ "gridwidth": 2,
+ "linecolor": "white",
+ "showbackground": true,
+ "ticks": "",
+ "zerolinecolor": "white"
+ }
+ },
+ "shapedefaults": {
+ "line": {
+ "color": "#2a3f5f"
+ }
+ },
+ "ternary": {
+ "aaxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ },
+ "baxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ },
+ "bgcolor": "#E5ECF6",
+ "caxis": {
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": ""
+ }
+ },
+ "title": {
+ "x": 0.05
+ },
+ "xaxis": {
+ "automargin": true,
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": "",
+ "title": {
+ "standoff": 15
+ },
+ "zerolinecolor": "white",
+ "zerolinewidth": 2
+ },
+ "yaxis": {
+ "automargin": true,
+ "gridcolor": "white",
+ "linecolor": "white",
+ "ticks": "",
+ "title": {
+ "standoff": 15
+ },
+ "zerolinecolor": "white",
+ "zerolinewidth": 2
+ }
+ }
+ },
+ "xaxis": {
+ "anchor": "y",
+ "domain": [
+ 0,
+ 1
+ ],
+ "title": {
+ "text": "n_features_to_select"
+ }
+ },
+ "yaxis": {
+ "anchor": "x",
+ "domain": [
+ 0,
+ 1
+ ],
+ "title": {
+ "text": "score"
+ }
+ }
+ }
+ },
+ "text/html": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import plotly.express as px\n",
+ "\n",
+ "px.line(\n",
+ " validation_scores_with_experiment_info,\n",
+ " x=\"n_features_to_select\",\n",
+ " y=\"score\",\n",
+ " color=\"ranker\"\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "Indeed, we can see XGBoost, Infinite Selection and Decision Tree are solid contenders for this dataset.\n",
+ "\n",
+ "🙌🏻"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "--- \n",
+ "\n",
+ "This has shown how easy it is to do a large benchmark with `fseval`. Cheers!"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3.9.14 64-bit",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.14"
+ },
+ "orig_nbformat": 4,
+ "vscode": {
+ "interpreter": {
+ "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
+ }
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/examples/comparing-feature-selectors/benchmark.py b/examples/comparing-feature-selectors/benchmark.py
new file mode 100644
index 0000000..7dcd3be
--- /dev/null
+++ b/examples/comparing-feature-selectors/benchmark.py
@@ -0,0 +1,53 @@
+import hydra
+import numpy as np
+from fseval.config import PipelineConfig
+from fseval.main import run_pipeline
+from infinite_selection import InfFS
+from sklearn.base import BaseEstimator
+from sklearn.feature_selection import chi2, f_classif, mutual_info_classif
+from sklearn.preprocessing import minmax_scale
+from stability_selection import StabilitySelection as RealStabilitySelection
+
+
+class StabilitySelection(RealStabilitySelection):
+ def fit(self, X, y):
+ super(StabilitySelection, self).fit(X, y)
+ self.support_ = self.get_support()
+ self.feature_importances_ = np.max(self.stability_scores_, axis=1)
+
+
+class InfiniteSelectionEstimator(BaseEstimator):
+ def fit(self, X, y):
+ inf = InfFS()
+ [RANKED, WEIGHT] = inf.infFS(X, y, alpha=0.5, supervision=1, verbose=1)
+
+ self.feature_importances_ = WEIGHT
+ self.ranking_ = RANKED
+
+
+class Chi2Classifier(BaseEstimator):
+ def fit(self, X, y):
+ X = minmax_scale(X)
+ scores, _ = chi2(X, y)
+ self.feature_importances_ = scores
+
+
+class ANOVAFValueClassifier(BaseEstimator):
+ def fit(self, X, y):
+ scores, _ = f_classif(X, y)
+ self.feature_importances_ = scores
+
+
+class MutualInfoClassifier(BaseEstimator):
+ def fit(self, X, y):
+ scores = mutual_info_classif(X, y)
+ self.feature_importances_ = scores
+
+
+@hydra.main(config_path="conf", config_name="my_config", version_base="1.1")
+def main(cfg: PipelineConfig) -> None:
+ run_pipeline(cfg)
+
+
+if __name__ == "__main__":
+ main()
diff --git a/examples/comparing-feature-selectors/conf/dataset/synthetic.yaml b/examples/comparing-feature-selectors/conf/dataset/synthetic.yaml
new file mode 100644
index 0000000..07d1835
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/dataset/synthetic.yaml
@@ -0,0 +1,13 @@
+name: My synthetic dataset
+task: classification
+adapter:
+ _target_: sklearn.datasets.make_classification
+ n_samples: 10000
+ n_informative: 2
+ n_classes: 2
+ n_features: 20
+ n_redundant: 0
+ random_state: 0
+ shuffle: false
+feature_importances:
+ X[:, 0:2]: 1.0
diff --git a/examples/comparing-feature-selectors/conf/my_config.yaml b/examples/comparing-feature-selectors/conf/my_config.yaml
new file mode 100644
index 0000000..bcbcc27
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/my_config.yaml
@@ -0,0 +1,9 @@
+defaults:
+ - base_pipeline_config
+ - _self_
+ - override dataset: synthetic
+ - override validator: knn
+ - override /callbacks:
+ - to_sql
+
+n_bootstraps: 1
diff --git a/examples/comparing-feature-selectors/conf/ranker/anova.yaml b/examples/comparing-feature-selectors/conf/ranker/anova.yaml
new file mode 100644
index 0000000..f6b4299
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/anova.yaml
@@ -0,0 +1,5 @@
+name: ANOVA F-value
+estimator:
+ _target_: benchmark.ANOVAFValueClassifier
+_estimator_type: classifier
+estimates_feature_importances: true
diff --git a/examples/comparing-feature-selectors/conf/ranker/boruta.yaml b/examples/comparing-feature-selectors/conf/ranker/boruta.yaml
new file mode 100644
index 0000000..ef7e0e4
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/boruta.yaml
@@ -0,0 +1,11 @@
+name: Boruta
+estimator:
+ _target_: boruta.boruta_py.BorutaPy
+ estimator:
+ _target_: sklearn.ensemble.RandomForestClassifier
+ n_estimators: auto
+_estimator_type: classifier
+multioutput: false
+estimates_feature_importances: false
+estimates_feature_support: true
+estimates_feature_ranking: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/chi2.yaml b/examples/comparing-feature-selectors/conf/ranker/chi2.yaml
new file mode 100644
index 0000000..ad6be06
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/chi2.yaml
@@ -0,0 +1,6 @@
+name: Chi-Squared
+estimator:
+ _target_: benchmark.Chi2Classifier
+_estimator_type: classifier
+requires_positive_X: true
+estimates_feature_importances: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/decision_tree_classifier.yaml b/examples/comparing-feature-selectors/conf/ranker/decision_tree_classifier.yaml
new file mode 100644
index 0000000..83b379e
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/decision_tree_classifier.yaml
@@ -0,0 +1,7 @@
+name: Decision Tree
+estimator:
+ _target_: sklearn.tree.DecisionTreeClassifier
+_estimator_type: classifier
+multioutput: true
+estimates_feature_importances: true
+estimates_target: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/infinite_selection.yaml b/examples/comparing-feature-selectors/conf/ranker/infinite_selection.yaml
new file mode 100644
index 0000000..2dfbb23
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/infinite_selection.yaml
@@ -0,0 +1,6 @@
+name: Infinite Selection
+estimator:
+ _target_: benchmark.InfiniteSelectionEstimator
+_estimator_type: classifier
+estimates_feature_importances: true
+estimates_feature_ranking: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/multisurf_classifier.yaml b/examples/comparing-feature-selectors/conf/ranker/multisurf_classifier.yaml
new file mode 100644
index 0000000..a9370af
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/multisurf_classifier.yaml
@@ -0,0 +1,6 @@
+name: MultiSURF
+estimator:
+ _target_: skrebate.MultiSURF
+_estimator_type: classifier
+multioutput: false
+estimates_feature_importances: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/mutual_info.yaml b/examples/comparing-feature-selectors/conf/ranker/mutual_info.yaml
new file mode 100644
index 0000000..bcec11c
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/mutual_info.yaml
@@ -0,0 +1,6 @@
+name: Mutual Info
+estimator:
+ _target_: benchmark.MutualInfoClassifier
+_estimator_type: classifier
+multioutput: false
+estimates_feature_importances: true
diff --git a/examples/comparing-feature-selectors/conf/ranker/relieff_classifier.yaml b/examples/comparing-feature-selectors/conf/ranker/relieff_classifier.yaml
new file mode 100644
index 0000000..2571683
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/relieff_classifier.yaml
@@ -0,0 +1,5 @@
+name: ReliefF
+estimator:
+ _target_: skrebate.ReliefF
+_estimator_type: classifier
+estimates_feature_importances: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/stability_selection.yaml b/examples/comparing-feature-selectors/conf/ranker/stability_selection.yaml
new file mode 100644
index 0000000..0e212cb
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/stability_selection.yaml
@@ -0,0 +1,10 @@
+name: Stability Selection
+estimator:
+ _target_: benchmark.StabilitySelection
+ base_estimator:
+ _target_: sklearn.linear_model.LogisticRegression
+ penalty: l2
+ bootstrap_func: stratified
+_estimator_type: classifier
+estimates_feature_importances: true
+estimates_feature_support: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/ranker/xgb_classifier.yaml b/examples/comparing-feature-selectors/conf/ranker/xgb_classifier.yaml
new file mode 100644
index 0000000..94fb1e8
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/ranker/xgb_classifier.yaml
@@ -0,0 +1,8 @@
+name: XGBoost
+estimator:
+ _target_: xgboost.XGBClassifier
+ use_label_encoder: False
+_estimator_type: classifier
+multioutput: false
+estimates_feature_importances: true
+estimates_target: true
\ No newline at end of file
diff --git a/examples/comparing-feature-selectors/conf/validator/knn.yaml b/examples/comparing-feature-selectors/conf/validator/knn.yaml
new file mode 100644
index 0000000..7a3b4c5
--- /dev/null
+++ b/examples/comparing-feature-selectors/conf/validator/knn.yaml
@@ -0,0 +1,6 @@
+name: k-NN
+estimator:
+ _target_: sklearn.neighbors.KNeighborsClassifier
+_estimator_type: classifier
+multioutput: false
+estimates_target: true
diff --git a/examples/comparing-feature-selectors/requirements.txt b/examples/comparing-feature-selectors/requirements.txt
new file mode 100644
index 0000000..0f63acd
--- /dev/null
+++ b/examples/comparing-feature-selectors/requirements.txt
@@ -0,0 +1,6 @@
+fseval
+-e git+https://github.com/dunnkers/infinite-selection.git@6c9db1d5fe1b12bc34eb2af5893a4f3ca385aaff#egg=infinite_selection
+-e git+https://github.com/dunnkers/stability-selection.git@baf54e7526bbce57d80871fcd93cdfdd67972a43#egg=stability_selection
+Boruta>=0.3
+skrebate>=0.62
+xgboost>=1
diff --git a/website/docs/recipes/comparing-feature-selectors.md b/website/docs/recipes/comparing-feature-selectors.md
new file mode 100644
index 0000000..35fc8d4
--- /dev/null
+++ b/website/docs/recipes/comparing-feature-selectors.md
@@ -0,0 +1,617 @@
+# Comparing Feature Selectors
+Hi! You want to compare the performance of multiple feature selectors? This is an example Notebook, showing you how to do such an analysis.
+
+## Prerequisites
+
+We are going to use more or less the same configuration as we did in the [Quick start](../../quick-start) example, but then with more Feature Selectors. Again, start by downloading the example project: [comparing-feature-selectors.zip](pathname:///fseval/zipped-examples/comparing-feature-selectors.zip)
+
+### Installing the required packages
+
+Now, let's install the required packages. Make sure you are in the `comparing-feature-selectors` folder, containing the `requirements.txt` file, and then run the following:
+
+```
+pip install -r requirements.txt
+```
+
+## Running the experiment
+
+Run the following command to start the experiment:
+
+```
+python benchmark.py --multirun ranker="glob(*)" +callbacks.to_sql.url="sqlite:////tmp/results.sqlite
+```
+
+## Analyzing the results
+
+There now should exist a `.sqlite` file at this path: `/tmp/results.sqlite`:
+
+ ```
+ $ ls -al /tmp/results.sqlite
+ -rw-r--r-- 1 vscode vscode 20480 Sep 21 08:16 /tmp/results.sqlite
+ ```
+
+Is that the case? Then let's now analyze the results! 📈
+
+We will install `plotly-express`, so we can make nice plots later.
+
+
+```python
+%pip install plotly-express nbconvert --quiet
+```
+
+
+Next, let's find a place to store our results to. In this case, we choose to store it in a local SQLite database, located at `/tmp/results.sqlite`.
+
+
+```python
+import os
+
+con: str = "sqlite:////tmp/results.sqlite"
+con
+```
+
+
+
+
+ 'sqlite:////tmp/results.sqlite'
+
+
+
+Now, we can read the `experiments` table.
+
+
+```python
+import pandas as pd
+
+experiments: pd.DataFrame = pd.read_sql_table("experiments", con=con, index_col="id")
+experiments
+```
+
+
+
+
+
+
+
+
+ |
+ dataset |
+ dataset/n |
+ dataset/p |
+ dataset/task |
+ dataset/group |
+ dataset/domain |
+ ranker |
+ validator |
+ local_dir |
+ date_created |
+
+
+ id |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+
+
+ 3lllxl48 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ ANOVA F-value |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:28:27.506838 |
+
+
+ 1944ropg |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ Boruta |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:28:31.230633 |
+
+
+ 31gd56gf |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ Chi-Squared |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:29:19.633012 |
+
+
+ a8washm5 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ Decision Tree |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:29:23.459190 |
+
+
+ 27i7uwg4 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ Infinite Selection |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:29:27.506974 |
+
+
+ 3velt3b9 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ MultiSURF |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:29:31.758090 |
+
+
+ 3fdrxlt6 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ Mutual Info |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:35:04.289361 |
+
+
+ 14lecx0g |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ ReliefF |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:35:08.614262 |
+
+
+ 3sggjvu3 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ Stability Selection |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:35:59.121416 |
+
+
+ dtt8bvo5 |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ XGBoost |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:36:23.385401 |
+
+
+
+
+
+
+
+Let's also read in the `validation_scores`.
+
+
+```python
+validation_scores: pd.DataFrame = pd.read_sql_table("validation_scores", con=con, index_col="id")
+validation_scores
+```
+
+
+
+
+
+
+
+
+ |
+ index |
+ n_features_to_select |
+ fit_time |
+ score |
+ bootstrap_state |
+
+
+ id |
+ |
+ |
+ |
+ |
+ |
+
+
+
+
+ 3lllxl48 |
+ 0 |
+ 1 |
+ 0.004433 |
+ 0.7955 |
+ 1 |
+
+
+ 3lllxl48 |
+ 0 |
+ 2 |
+ 0.004227 |
+ 0.7910 |
+ 1 |
+
+
+ 3lllxl48 |
+ 0 |
+ 3 |
+ 0.005183 |
+ 0.7950 |
+ 1 |
+
+
+ 3lllxl48 |
+ 0 |
+ 4 |
+ 0.003865 |
+ 0.7965 |
+ 1 |
+
+
+ 3lllxl48 |
+ 0 |
+ 5 |
+ 0.002902 |
+ 0.7950 |
+ 1 |
+
+
+ ... |
+ ... |
+ ... |
+ ... |
+ ... |
+ ... |
+
+
+ dtt8bvo5 |
+ 0 |
+ 16 |
+ 0.000670 |
+ 0.7805 |
+ 1 |
+
+
+ dtt8bvo5 |
+ 0 |
+ 17 |
+ 0.000480 |
+ 0.7725 |
+ 1 |
+
+
+ dtt8bvo5 |
+ 0 |
+ 18 |
+ 0.003159 |
+ 0.7760 |
+ 1 |
+
+
+ dtt8bvo5 |
+ 0 |
+ 19 |
+ 0.000848 |
+ 0.7650 |
+ 1 |
+
+
+ dtt8bvo5 |
+ 0 |
+ 20 |
+ 0.000565 |
+ 0.7590 |
+ 1 |
+
+
+
+
160 rows × 5 columns
+
+
+
+
+We can now merge them. Notice that we set as the _index_ the experiment ID, so we can use `pd.DataFrame.join` to do this.
+
+
+```python
+validation_scores_with_experiment_info = experiments.join(
+ validation_scores
+)
+validation_scores_with_experiment_info.head(1)
+```
+
+
+
+
+
+
+
+
+ |
+ dataset |
+ dataset/n |
+ dataset/p |
+ dataset/task |
+ dataset/group |
+ dataset/domain |
+ ranker |
+ validator |
+ local_dir |
+ date_created |
+ index |
+ n_features_to_select |
+ fit_time |
+ score |
+ bootstrap_state |
+
+
+ id |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+
+
+ 14lecx0g |
+ My synthetic dataset |
+ 10000 |
+ 20 |
+ classification |
+ None |
+ None |
+ ReliefF |
+ k-NN |
+ /workspaces/fseval/examples/comparing-feature-... |
+ 2022-10-22 14:35:08.614262 |
+ NaN |
+ NaN |
+ NaN |
+ NaN |
+ NaN |
+
+
+
+
+
+
+
+Cool! That will be all the information that we need. Let's first create an overview for all the rankers we benchmarked.
+
+
+```python
+validation_scores_with_experiment_info \
+ .groupby("ranker") \
+ .mean(numeric_only=True) \
+ .sort_values("score", ascending=False)
+```
+
+
+
+
+
+
+
+
+ |
+ dataset/n |
+ dataset/p |
+ index |
+ n_features_to_select |
+ fit_time |
+ score |
+ bootstrap_state |
+
+
+ ranker |
+ |
+ |
+ |
+ |
+ |
+ |
+ |
+
+
+
+
+ Infinite Selection |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.004600 |
+ 0.818925 |
+ 1.0 |
+
+
+ XGBoost |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.002998 |
+ 0.818575 |
+ 1.0 |
+
+
+ Decision Tree |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.002810 |
+ 0.817675 |
+ 1.0 |
+
+
+ Stability Selection |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.002406 |
+ 0.803325 |
+ 1.0 |
+
+
+ Chi-Squared |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.002548 |
+ 0.795975 |
+ 1.0 |
+
+
+ ANOVA F-value |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.003745 |
+ 0.789275 |
+ 1.0 |
+
+
+ Mutual Info |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.002314 |
+ 0.786475 |
+ 1.0 |
+
+
+ Boruta |
+ 10000.0 |
+ 20.0 |
+ 0.0 |
+ 10.5 |
+ 0.002366 |
+ 0.518075 |
+ 1.0 |
+
+
+ MultiSURF |
+ 10000.0 |
+ 20.0 |
+ NaN |
+ NaN |
+ NaN |
+ NaN |
+ NaN |
+
+
+ ReliefF |
+ 10000.0 |
+ 20.0 |
+ NaN |
+ NaN |
+ NaN |
+ NaN |
+ NaN |
+
+
+
+
+
+
+
+Already, we notice that MultiSURF and ReliefF are missing. This is because the experiments failed. That can happen in a big benchmark! We will ignore this for now and continue with the other Feature Selectors.
+
+👀 We can already observe, that the _average_ classification accuracy is the highest for Infinite Selection. Although it would be premature to say it is the best, this is an indication that it did will for this dataset.
+
+Let's plot the results _per_ `n_features_to_select`. Note, that `n_features_to_select` means a validation step was run using a feature subset of size `n_features_to_select`.
+
+
+```python
+import plotly.express as px
+
+px.line(
+ validation_scores_with_experiment_info,
+ x="n_features_to_select",
+ y="score",
+ color="ranker"
+)
+```
+
+
+![feature selectors comparison plot](/img/recipes/feature-selectors-comparison-plot.png)
+
+
+Indeed, we can see XGBoost, Infinite Selection and Decision Tree are solid contenders for this dataset.
+
+🙌🏻
+
+---
+
+This has shown how easy it is to do a large benchmark with `fseval`. Cheers!
diff --git a/website/static/img/recipes/feature-selectors-comparison-plot.png b/website/static/img/recipes/feature-selectors-comparison-plot.png
new file mode 100644
index 0000000..6f32139
Binary files /dev/null and b/website/static/img/recipes/feature-selectors-comparison-plot.png differ
diff --git a/website/static/zipped-examples/comparing-feature-selectors.zip b/website/static/zipped-examples/comparing-feature-selectors.zip
new file mode 100644
index 0000000..24e3f0c
Binary files /dev/null and b/website/static/zipped-examples/comparing-feature-selectors.zip differ