(feat) Final model

Final model with all improvements was added, along with some fixes.
pabramber01 · Apr 12, 2024 · 62d07f4 · 62d07f4
1 parent 8d9e50a
commit 62d07f4
Show file tree

Hide file tree

Showing 29 changed files with 16,484 additions and 16 deletions.
diff --git a/_toc.yml b/_toc.yml
@@ -110,3 +110,22 @@ parts:
         sections:
           - file: f1_prediction/other_models/advanced_tuning_posvar
           - file: f1_prediction/other_models/advanced_tuning_pdm
+
+  - caption: Final model
+    chapters:
+      - file: f1_prediction/final_model/model_validation
+        sections:
+          - file: f1_prediction/final_model/model_validation_posvar
+          - file: f1_prediction/final_model/model_validation_pdm
+      - file: f1_prediction/final_model/simple_tuning
+        sections:
+          - file: f1_prediction/final_model/simple_tuning_posvar
+          - file: f1_prediction/final_model/simple_tuning_pdm
+      - file: f1_prediction/final_model/feature_selection
+        sections:
+          - file: f1_prediction/final_model/feature_selection_posvar
+          - file: f1_prediction/final_model/feature_selection_pdm
+      - file: f1_prediction/final_model/advanced_tuning
+        sections:
+          - file: f1_prediction/final_model/advanced_tuning_posvar
+          - file: f1_prediction/final_model/advanced_tuning_pdm
diff --git a/f1_prediction/adding_data/feature_selection_posvar.ipynb b/f1_prediction/adding_data/feature_selection_posvar.ipynb
@@ -62,7 +62,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the selection of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/adding_data/model_validation_add_posvar.ipynb b/f1_prediction/adding_data/model_validation_add_posvar.ipynb
@@ -65,7 +65,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the validation of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/adding_data/model_validation_all_posvar.ipynb b/f1_prediction/adding_data/model_validation_all_posvar.ipynb
@@ -65,7 +65,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the validation of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/adding_data/model_validation_cir_posvar.ipynb b/f1_prediction/adding_data/model_validation_cir_posvar.ipynb
@@ -65,7 +65,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the validation of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/adding_data/model_validation_drv_posvar.ipynb b/f1_prediction/adding_data/model_validation_drv_posvar.ipynb
@@ -65,7 +65,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the validation of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/adding_data/model_validation_wea_posvar.ipynb b/f1_prediction/adding_data/model_validation_wea_posvar.ipynb
@@ -65,7 +65,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the validation of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/assets/data/processed/final_model.csv b/f1_prediction/assets/data/processed/final_model.csv
diff --git a/f1_prediction/assets/data/processed/final_model_X.csv b/f1_prediction/assets/data/processed/final_model_X.csv
diff --git a/f1_prediction/base_model/feature_selection_posvar.ipynb b/f1_prediction/base_model/feature_selection_posvar.ipynb
@@ -62,7 +62,7 @@
    "cell_type": "markdown",
    "metadata": {},
    "source": [
-    "First we will do the tuning of the model that predicts the final position of each driver at a ±1 interval.\n"
+    "First we will do the selection of the model that predicts the final position of each driver at a ±1 interval.\n"
    ]
   },
   {

diff --git a/f1_prediction/final_model/advanced_tuning.ipynb b/f1_prediction/final_model/advanced_tuning.ipynb
@@ -0,0 +1,32 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Advanced hyperparamenter tuning\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Tuning methods\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After feature selection, we proceed to advanced hyperparameter tuning. For this purpose, optuna, an automatic hyperparameter optimization software framework specially designed for machine learning, will be used.\n"
+   ]
+  }
+ ],
+ "metadata": {
+  "language_info": {
+   "name": "python"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/f1_prediction/final_model/advanced_tuning_pdm.ipynb b/f1_prediction/final_model/advanced_tuning_pdm.ipynb
diff --git a/f1_prediction/final_model/advanced_tuning_posvar.ipynb b/f1_prediction/final_model/advanced_tuning_posvar.ipynb
diff --git a/f1_prediction/final_model/feature_selection.ipynb b/f1_prediction/final_model/feature_selection.ipynb
@@ -0,0 +1,112 @@
+{
+ "cells": [
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "# Feature selection\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Dependencies\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "The dependencies used are as follows\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 1,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "from sklearn.model_selection import cross_val_score"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "## Selection methods\n"
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
+    "After hyperparameter tuning, we proceed to attribute selection. Three methods will be used for this\n",
+    "\n",
+    "- To see the most important attributes, PermutationImportance\n",
+    "- For a more exhaustive search, SequentialForwardSelector\n",
+    "- For a more stochastic search, GeneticAlgorithms\n",
+    "\n",
+    "The main method will be Sequential forward selection (SFS), in which features are sequentially added to an empty candidate set until the addition of more features does not lower the criterion.\n",
+    "\n",
+    "PermutationImportance we will use it to corroborate the results, as well as to see which attributes contribute the most to the performance of the model. The performance obtained by this measure and the previous one may differ because, even if one measure is of little relevance by itself, combined with others it can improve the model significantly.\n",
+    "\n",
+    "Finally, we will use genetic algorithms to check with a small stochastic search if there is a possibility that there are other combinations that improve performance. This is because SequentialForwardSelector adds measures starting from one measure, i.e., it does not check all combinations and there may be a better one. Regarding the genetic algorithm itself, the fitness function will correspond to the cross-validation of a binary individual, where a 1 in position i will represent that measure i is taken for the evaluation, and if it is 0 it is not.\n"
+   ]
+  },
+  {
+   "cell_type": "code",
+   "execution_count": 2,
+   "metadata": {},
+   "outputs": [],
+   "source": [
+    "def fitness_func(ga_instance, individual, individual_idx):\n",
+    "    res = []\n",
+    "\n",
+    "    if rank:\n",
+    "        individual[X.columns.get_loc(\"qid\")] = 1\n",
+    "\n",
+    "    idx = [i for i in range(len(individual)) if individual[i] == 1]\n",
+    "\n",
+    "    attributes = X.iloc[:, idx]\n",
+    "    objective = y\n",
+    "\n",
+    "    if not attributes.empty:\n",
+    "        res.extend(\n",
+    "            cross_val_score(\n",
+    "                estimator=estimator,\n",
+    "                X=attributes,\n",
+    "                y=objective,\n",
+    "                cv=tscv,\n",
+    "                scoring=scor,\n",
+    "                n_jobs=-1,\n",
+    "            )\n",
+    "        )\n",
+    "\n",
+    "    avg_cross_val_score = sum(res) / len(res) if not attributes.empty else -10000\n",
+    "    return avg_cross_val_score"
+   ]
+  }
+ ],
+ "metadata": {
+  "kernelspec": {
+   "display_name": "Python 3",
+   "language": "python",
+   "name": "python3"
+  },
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": "3.10.11"
+  }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
diff --git a/f1_prediction/final_model/feature_selection_pdm.ipynb b/f1_prediction/final_model/feature_selection_pdm.ipynb
diff --git a/f1_prediction/final_model/feature_selection_posvar.ipynb b/f1_prediction/final_model/feature_selection_posvar.ipynb