Skip to content

This repositiory contain project that aims to show the basic workflow of training a machine learning algorithm, from importing the data to evaluating the model’s performance.

Notifications You must be signed in to change notification settings

adrgryn/house_classification_hs

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

3 Commits
 
 
 
 

Repository files navigation

house_classification_hs

This project aims to predict house prices based on various features using a DecisionTreeClassifier. The model's performance is evaluated using three different encoding techniques: OneHotEncoder, OrdinalEncoder, and TargetEncoder.

Project Structure The project is divided into several scripts, each responsible for a different part of the machine learning pipeline: data.py: Contains the logic to download and load the dataset. one_hot_encode.py, ordinal_encoder.py, target_encoder.py: Each script applies a different encoding technique to the categorical features and evaluates the model's performance. solution.py: Compare F1-score macro average values

Installation Ensure you have Python installed on your system. Then, install the necessary libraries using: pip install pandas, scikit-learn, requests, category_encoders

Usage Run the main script to evaluate the model's performance with different encoders: python solution.py

Dataset The dataset house_class.csv includes features such as Area, Room, Longitude (Lon), Latitude (Lat), Zip_area, and Zip_loc, with the target variable being Price.

Model Evaluation The models are evaluated based on precision, recall, and F1-score metrics for each encoding technique. The main script prints out the F1-score macro average value for comparison: OneHotEncoder: F1-score OrdinalEncoder: F1-score TargetEncoder: F1-score

Encoding Techniques OneHotEncoder: Converts categorical variables into a form that could be provided to ML algorithms to do a better prediction. OrdinalEncoder: Encodes categorical features as an integer array, considering the order of the categories. TargetEncoder: Uses the means of the target variable for each category to encode the features.

Model A DecisionTreeClassifier with the following parameters is used for prediction: criterion: "entropy" max_features: 3 splitter: 'best' max_depth: 6 min_samples_split: 4 random_state: 3

Conclusion This project demonstrates how different encoding techniques can affect the performance of a machine learning model. By comparing the F1-score macro average values, one can select the most suitable encoding technique for this dataset.

About

This repositiory contain project that aims to show the basic workflow of training a machine learning algorithm, from importing the data to evaluating the model’s performance.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages