-
Notifications
You must be signed in to change notification settings - Fork 0
Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
Final model with all improvements was added, along with some fixes.
- Loading branch information
1 parent
8d9e50a
commit 62d07f4
Showing
29 changed files
with
16,484 additions
and
16 deletions.
There are no files selected for viewing
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
5,740 changes: 5,740 additions & 0 deletions
5,740
f1_prediction/assets/data/processed/final_model.csv
Large diffs are not rendered by default.
Oops, something went wrong.
5,740 changes: 5,740 additions & 0 deletions
5,740
f1_prediction/assets/data/processed/final_model_X.csv
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,32 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Advanced hyperparamenter tuning\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Tuning methods\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"After feature selection, we proceed to advanced hyperparameter tuning. For this purpose, optuna, an automatic hyperparameter optimization software framework specially designed for machine learning, will be used.\n" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"language_info": { | ||
"name": "python" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Large diffs are not rendered by default.
Oops, something went wrong.
Large diffs are not rendered by default.
Oops, something went wrong.
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Original file line number | Diff line number | Diff line change |
---|---|---|
@@ -0,0 +1,112 @@ | ||
{ | ||
"cells": [ | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"# Feature selection\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Dependencies\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"The dependencies used are as follows\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 1, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"from sklearn.model_selection import cross_val_score" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"## Selection methods\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "markdown", | ||
"metadata": {}, | ||
"source": [ | ||
"After hyperparameter tuning, we proceed to attribute selection. Three methods will be used for this\n", | ||
"\n", | ||
"- To see the most important attributes, PermutationImportance\n", | ||
"- For a more exhaustive search, SequentialForwardSelector\n", | ||
"- For a more stochastic search, GeneticAlgorithms\n", | ||
"\n", | ||
"The main method will be Sequential forward selection (SFS), in which features are sequentially added to an empty candidate set until the addition of more features does not lower the criterion.\n", | ||
"\n", | ||
"PermutationImportance we will use it to corroborate the results, as well as to see which attributes contribute the most to the performance of the model. The performance obtained by this measure and the previous one may differ because, even if one measure is of little relevance by itself, combined with others it can improve the model significantly.\n", | ||
"\n", | ||
"Finally, we will use genetic algorithms to check with a small stochastic search if there is a possibility that there are other combinations that improve performance. This is because SequentialForwardSelector adds measures starting from one measure, i.e., it does not check all combinations and there may be a better one. Regarding the genetic algorithm itself, the fitness function will correspond to the cross-validation of a binary individual, where a 1 in position i will represent that measure i is taken for the evaluation, and if it is 0 it is not.\n" | ||
] | ||
}, | ||
{ | ||
"cell_type": "code", | ||
"execution_count": 2, | ||
"metadata": {}, | ||
"outputs": [], | ||
"source": [ | ||
"def fitness_func(ga_instance, individual, individual_idx):\n", | ||
" res = []\n", | ||
"\n", | ||
" if rank:\n", | ||
" individual[X.columns.get_loc(\"qid\")] = 1\n", | ||
"\n", | ||
" idx = [i for i in range(len(individual)) if individual[i] == 1]\n", | ||
"\n", | ||
" attributes = X.iloc[:, idx]\n", | ||
" objective = y\n", | ||
"\n", | ||
" if not attributes.empty:\n", | ||
" res.extend(\n", | ||
" cross_val_score(\n", | ||
" estimator=estimator,\n", | ||
" X=attributes,\n", | ||
" y=objective,\n", | ||
" cv=tscv,\n", | ||
" scoring=scor,\n", | ||
" n_jobs=-1,\n", | ||
" )\n", | ||
" )\n", | ||
"\n", | ||
" avg_cross_val_score = sum(res) / len(res) if not attributes.empty else -10000\n", | ||
" return avg_cross_val_score" | ||
] | ||
} | ||
], | ||
"metadata": { | ||
"kernelspec": { | ||
"display_name": "Python 3", | ||
"language": "python", | ||
"name": "python3" | ||
}, | ||
"language_info": { | ||
"codemirror_mode": { | ||
"name": "ipython", | ||
"version": 3 | ||
}, | ||
"file_extension": ".py", | ||
"mimetype": "text/x-python", | ||
"name": "python", | ||
"nbconvert_exporter": "python", | ||
"pygments_lexer": "ipython3", | ||
"version": "3.10.11" | ||
} | ||
}, | ||
"nbformat": 4, | ||
"nbformat_minor": 2 | ||
} |
Large diffs are not rendered by default.
Oops, something went wrong.
598 changes: 598 additions & 0 deletions
598
f1_prediction/final_model/feature_selection_posvar.ipynb
Large diffs are not rendered by default.
Oops, something went wrong.
Oops, something went wrong.