The goal of this project is to predict house prices using the California Housing dataset. The XGBoost regression model is utilized for accurate predictions.
The California Housing dataset, obtained from the sklearn.datasets library, serves as the foundation for this project. It encompasses various features related to block groups in California, with the target variable being the median house value.
Initial exploration includes:
- Displaying the first few rows of the dataset.
- Describing basic statistics.
- Checking for missing values.
- Visualizing the correlation matrix.
XGBoost, a popular gradient boosting algorithm, is employed for building the house price prediction model. The model is trained on a training set and evaluated on a test set.
The model's performance is evaluated using metrics such as mean absolute error, mean squared error, and R-squared.
To use this project:
- Install the required dependencies (
pandas
,numpy
,matplotlib
,seaborn
,sklearn
,xgboost
). - Run the Jupyter Notebook or Python script.
- pandas
- numpy
- matplotlib
- seaborn
- sklearn
- xgboost
Install the dependencies using:
pip install pandas numpy matplotlib seaborn scikit-learn xgboost