Skip to content

Commit

Permalink
remove model saving and header
Browse files Browse the repository at this point in the history
  • Loading branch information
lisa-sousa committed Sep 11, 2024
1 parent 1c05740 commit d6fd9eb
Show file tree
Hide file tree
Showing 2 changed files with 8 additions and 22 deletions.
20 changes: 0 additions & 20 deletions xai-for-random-forest/Bio-1-Tutorial_RandomForest_Models.ipynb
Original file line number Diff line number Diff line change
Expand Up @@ -3480,26 +3480,6 @@
"source": [
"Using the balanced accuracy shows us that the model performs very well on the training set but very poorly on the test set, i.e. fails to generalize to unseen data. This behaviour is a sign of model **overfitting**, where the model learns the training data too well, to the point that it captures not only the underlying patterns but also the noise and random fluctuations in the data. "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Let's now save the model in a ``pickle`` file, such that we can load the trained model into other notebooks later on."
]
},
{
"cell_type": "code",
"execution_count": 42,
"metadata": {},
"outputs": [],
"source": [
"# Save the model with joblib\n",
"data_and_model = [X_train, X_test, y_train, y_test, rf]\n",
"\n",
"with open('../models/model_rf_cervicalcancer.pickle', 'wb') as handle:\n",
" pickle.dump(data_and_model, handle, protocol=pickle.HIGHEST_PROTOCOL)"
]
}
],
"metadata": {
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -287,8 +287,6 @@
"id": "029ea700",
"metadata": {},
"source": [
"### Permutation Feature Importance\n",
"\n",
"Now, let's use Permutation Feature Importance to get insights into the Random Forest Classification model we loaded above. We can use the scikit-learn implementation called `permutation_importance` to get the importance values for the features in our model. For measuring the performance drop when permuting a feature, we use the standard metric of our trained model, which is, in our case, the accuracy score. Using the same score enables us to evaluate the performance drop in relation to the baseline performance. We do 40 repetitions of permutation for each feature to get more reliable results.\n",
"\n",
"*Note: this method is a **global** method, which means that it only provides explanations for the full dataset but not for individual examples.*\n",
Expand Down Expand Up @@ -395,6 +393,14 @@
"As mentioned before, permutation feature importance assumes feature independence. High correlation among features breaks this assumption and hence, can have an impact on the feature importance analysis."
]
},
{
"cell_type": "markdown",
"id": "3f21ac5c",
"metadata": {},
"source": [
"---"
]
},
{
"cell_type": "markdown",
"id": "92f910a6",
Expand Down

0 comments on commit d6fd9eb

Please sign in to comment.