- Introduction
- Import Libraries and Load Data
- Data Preprocessing
- Define Independent and Dependent Features
- Train-Test Split
- Feature Selection Based on Correlation
- Feature Scaling
- Linear Regression Model
- Lasso Regression
- Hyperparameter Tuning for Lasso Regression
- Ridge Regression
- Hyperparameter Tuning for Ridge Regression
- Elastic Net
- Hyperparameter Tuning for Elastic Net
- Save Models and Scaler
- Results
- Conclusion
- Future Work
This project aims to predict the Fire Weather Index (FWI) using various regression models. The dataset used is the Algerian Forest Fires dataset, which has been cleaned and preprocessed.
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
%matplotlib inline
df = pd.read_csv(r'N:\Personal_Projects\Machine-Learning\Algerianforestfire\Algerian_forest_fires_cleaned_dataset.csv')
df.head()
df.tail()
df.info()
df.dtypes
df.describe()
df.info()
df.columns
df.drop(['day', 'month', 'year'], axis=1, inplace=True)
df.head()
df['Classes'] = np.where(df['Classes'].str.contains("not fire"), 0, 1)
df.head()
df['Classes'].value_counts()
X = df.drop('FWI', axis=1)
y = df['FWI']
X.head()
y
from sklearn.model_selection import train_test_split
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=42)
X_train.shape, X_test.shape
X_train.corr()
plt.figure(figsize=(12,10))
corr = X_train.corr()
sns.heatmap(corr, annot=True)
def correlation(dataset, threshold):
col_corr = set()
corr_matrix = dataset.corr()
for i in range(len(corr_matrix.columns)):
for j in range(i):
if abs(corr_matrix.iloc[i, j]) > threshold:
colname = corr_matrix.columns[i]
col_corr.add(colname)
return col_corr
corr_features = correlation(X_train, 0.85)
X_train.drop(corr_features, axis=1, inplace=True)
X_test.drop(corr_features, axis=1, inplace=True)
X_train.shape, X_test.shape
from sklearn.preprocessing import StandardScaler
scaler = StandardScaler()
X_train_scaled = scaler.fit_transform(X_train)
X_test_scaled = scaler.transform(X_test)
X_train_scaled
X_test_scaled
plt.subplots(figsize=(15, 5))
plt.subplot(1, 2, 1)
sns.boxplot(data=X_train)
plt.title('X_train Before Scaling')
plt.subplot(1, 2, 2)
sns.boxplot(data=X_train_scaled)
plt.title('X_train After Scaling')
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_absolute_error, r2_score
linearreg = LinearRegression()
linearreg.fit(X_train_scaled, y_train)
y_pred = linearreg.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 score", score)
plt.scatter(y_test, y_pred)
from sklearn.linear_model import Lasso
lasso = Lasso()
lasso.fit(X_train_scaled, y_train)
y_pred = lasso.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)
plt.scatter(y_test, y_pred)
from sklearn.linear_model import LassoCV
lass = LassoCV(cv=5)
lass.fit(X_train_scaled, y_train)
y_pred = lass.predict(X_test_scaled)
plt.scatter(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)
from sklearn.linear_model import Ridge
rid = Ridge()
rid.fit(X_train_scaled, y_train)
y_pred = rid.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)
plt.scatter(y_test, y_pred)
from sklearn.linear_model import RidgeCV
ridgecv = RidgeCV(cv=5)
ridgecv.fit(X_train_scaled, y_train)
y_pred = ridgecv.predict(X_test_scaled)
plt.scatter(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)
from sklearn.linear_model import ElasticNet
elnet = ElasticNet()
elnet.fit(X_train_scaled, y_train)
y_pred = elnet.predict(X_test_scaled)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)
plt.scatter(y_test, y_pred)
from sklearn.linear_model import ElasticNetCV
elasticcv = ElasticNetCV(cv=5)
elasticcv.fit(X_train_scaled, y_train)
y_pred = elasticcv.predict(X_test_scaled)
plt.scatter(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)
score = r2_score(y_test, y_pred)
print("Mean absolute error", mae)
print("R2 Score", score)
import pickle
pickle.dump(scaler, open('scaler.pkl', 'wb'))
pickle.dump(rid, open('rid.pkl', 'wb'))
- Mean Absolute Error: 0.5468236465249978
- R2 Score: 0.9847657384266951
- Mean Absolute Error: 1.1331759949144085
- R2 Score: 0.9492020263112388
- Mean Absolute Error: 0.6199701158263433
- R2 Score: 0.9820946715928275
- Mean Absolute Error: 0.5642305340105693
- R2 Score: 0.9842993364555513
- Mean Absolute Error: 0.5642305340105693
- R2 Score: 0.9842993364555513
- Mean Absolute Error: 1.8822353634896
- R2 Score: 0.8753460589519703
- Mean Absolute Error: 0.6575946731430904
- R2 Score: 0.9814217587854941
This project demonstrates the application of various regression models to predict the Fire Weather Index (FWI). The models were evaluated based on Mean Absolute Error and R2 Score. Hyperparameter tuning was performed to improve model performance. The best model can be selected based on the evaluation metrics.
- Explore more advanced models such as Random Forest, Gradient Boosting, or Neural Networks.
- Perform feature engineering to create new features that might improve model performance.
- Conduct a more thorough hyperparameter tuning using GridSearchCV or RandomizedSearchCV.