Football Player Valuation Prediction

Problem and Project Description

In the realm of professional football, a player transfer takes place when a player, under contractual obligations, transitions from one club to another. This intricate process involves the official relocation of a player's registration from their current football club to a new one. Typically, the transfer initiation occurs when a representative from an interested club officially inquires with the club where their prospective player is currently registered. If the selling club expresses an openness to the idea, negotiations commence for a transfer fee. These negotiations are often facilitated by intermediaries and involve determining the financial compensation to be paid by the acquiring club. However, price negotiations between clubs are time-consuming and lack standardization. Nowadays, the player's current club often aims to maximize the price for their player, contributing to significant inflation in the football player market value.

This project is dedicated to the prediction of football players' market values, drawing insights from their in-game, profile, and attribute statistics. Utilizing machine learning techniques, we aim to provide a valuable tool for clubs, agents, and enthusiasts, enabling them to assess and comprehend the market value of players. The predictive player valuation model can help professional clubs to setting a reasonable starting point for negotiations regarding a player's price.

Dataset

The dataset of this project taken from Kaggle Dataset link. In this project we dont use all of available data from the kaggle. The used dataset: apperances.csv, games.csv, players.csv. The used dataset can check and download in this repository , spesifically on data-raw folder.

Dependencies

To run this project, you will need the following dependencies:

Python 3.9
Flask==3.0.0
gunicorn==21.2.0
scikit-learn==1.3.0

Project dependencies can be installed by running:

pip install -r requirements.txt

Or alternatively can create virtual environtment from prepared pipfile using Pipenv :

Create enviroment from pipfile :

pipenv install

Enter the created environment :

pipenv shell

Model Creation

The selected model in this project is:

Linear Regression

Linear regression is a fundamental statistical technique used for modeling the relationship between a dependent variable and one or more independent variables by fitting a linear equation to the observed data. It aims to find the best-fitting line (or hyperplane in multi-dimensional space) that minimizes the sum of squared differences between predicted and actual values, making it a valuable tool for making predictions and understanding the linear relationships within data.
Decision Tree

A decision tree is a supervised machine learning algorithm used for both classification and regression tasks. It works by recursively partitioning the dataset into subsets based on the most significant attribute at each level, resulting in a tree-like structure. Decision trees are interpretable and can make predictions by traversing the tree from the root to a leaf node, providing insights into decision-making processes in a visual and intuitive way.
XGBoost

XGBoost (Extreme Gradient Boosting) is a powerful and efficient ensemble machine learning algorithm that combines the strengths of gradient boosting and tree-based methods. It's widely used for both classification and regression tasks and is known for its speed, scalability, and effectiveness in improving predictive accuracy. XGBoost builds an ensemble of decision trees, continually improving the model's performance by minimizing the loss function and handling overfitting through regularization techniques.

Evaluation Metrics

In this project, we use RMSE as the metric he Root Mean Squared Error (RMSE) is a measure of the average deviation between predicted values and actual values in a dataset. It is often used to evaluate the accuracy of a predictive model.

The formula for RMSE is:

$$\sqrt{\frac{1}{n} \sum_{i=1}^{n} (y_i - \hat{y}_i)^2}$$

Where:

$n$ is the total number of data points.
$y_i$ represents the actual (observed) value for data point (i).
$\hat{y}_i$ represents the predicted value for data point (i).

The RMSE value quantifies the typical error or "residuals" between actual and predicted values, with lower RMSE values indicating a better-fitting model.

Used Features

height_in_cm: Player's height.
goals_2022: Goals scored by the player in the 2022 season.
games_2022: Games played by the player in the 2022 season.
assists_2022: Assists created by the player in the 2022 season.
minutes_played_2022: Total minutes of play by the player in the 2022 season.
goals_for_2022: Total goals scored by the player's club in the 2022 season.
goals_against_2022: Total goals conceded by the player's club in the 2022 season.
clean_sheet_2022: Total clean sheets kept by the player's club in the 2022 season.
age: Player's age.

Best Model

From the exploration, we determined that XGBoost is the best model, achieving RMSE scores of 0.859 for the validation data and 0.891 for the test data. For detailed exploration, please refer to this notebook

Model Deployment on Web Service:

The trained model should be deploy for ease access of the model functionality. One of the most effective method is we serve the model as web service. In this project we use Flask (an python web service framework). The implementation of the flask web service in this repo can found here

To run the web service locally:

Go to root directory of the project
Serve using gunicorn with below command:

gunicorn --bind 0.0.0.0:9696 predict:ap

Hit the prepared predict api route with post method and send the required payload input :

Model Deployment on Cloud (AWS):

Docker Containerization:

Before we deploy our model web service , we wrap it with docker container first. Docker container can wrap all dependencies needed with the apps so we can avoid dependencies conflic when we run our app in the cloud environment.

Install Docker desktop
Build docker image from Dockerfile script

docker build -t <tag-name> <Dockerfile-location-path>

Make sure the docker image already created successfully with:

docker images

Run docker images on port 9696

docker run -p 9696:9696 <tag-name>

When docker container is running , we can access the flask web service inside the container through 9696 port.

Cloud Deployment:

After we successfully wrap our web service app in docker container, we are ready now to deploy to cloud environtment. Here we use one of cloud computing service in AWS, which is Elastic Beanstalk. Elastic Beanstalk is a Platform-as-a-Service (PaaS) offering by AWS that simplifies application deployment and management. It allows us to easily deploy, monitor, and scale web applications and services without dealing with the underlying infrastructure, making it a convenient choice for quickly launching web applications.

Step by step AWS Elastic Benstalk (EBS) deployment:

Init EBS application

 eb init -p "Docker running on 64bit Amazon Linux 2" -r <region-code-name> <desired-application-name>

Test eb server running well locally

 eb local run --port 9696

Create environtment and deploy our apps on it

 eb create <desired-env-name>

This project already deployed to AWS EBS that can be accessed on:

 [POST] http://player-valuation2-env.eba-hi5iceym.ap-southeast-1.elasticbeanstalk.com/predict

Example of the way to access our deployed model on cloud

Acknowledgement

This repo is intended to submission of midterm project of mlzoomcamp.

Name		Name	Last commit message	Last commit date
Latest commit History 20 Commits
.ipynb_checkpoints		.ipynb_checkpoints
__pycache__		__pycache__
data-raw		data-raw
.dockerignore		.dockerignore
.gitattributes		.gitattributes
.gitignore		.gitignore
Dockerfile		Dockerfile
Pipfile		Pipfile
Pipfile.lock		Pipfile.lock
README.md		README.md
model_v1.bin		model_v1.bin
notebook.ipynb		notebook.ipynb
predict-test.ipynb		predict-test.ipynb
predict.py		predict.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Football Player Valuation Prediction

Problem and Project Description

Dataset

Dependencies

Model Creation

Evaluation Metrics

Used Features

Best Model

Model Deployment on Web Service:

Model Deployment on Cloud (AWS):

Docker Containerization:

Cloud Deployment:

Acknowledgement

About

Releases

Packages

Languages

ryanpram/player-valuation-prediction

Folders and files

Latest commit

History

Repository files navigation

Football Player Valuation Prediction

Problem and Project Description

Dataset

Dependencies

Model Creation

Evaluation Metrics

Used Features

Best Model

Model Deployment on Web Service:

Model Deployment on Cloud (AWS):

Docker Containerization:

Cloud Deployment:

Acknowledgement

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages