Skip to content

thisisgurpal/Premier-League-Analytics

Repository files navigation

Premier League Team & Player Dashboards

This project is for football lovers. Football lovers who want to stay updated. Football lovers who want to know the stats.

Is Rordi the top passer per match?

Does Virgil Van Dijk have the best tackle success rate?

What is the shooting accuracy of Erling Haaland?

Which team gets the most cards?

Click here to go to the live dashboard and get answers to all these questions, and more.

Premier League Dashboard Tableau

Table of Contents

1. Introduction

I wanted to learn web scraping. I thought about certifications. I thought about text books. But no, I think the best way to learn is by doing. I chose a passion which is football. Scraped 2000+ lines of data, along with multiple attributes, from the Premier League website. Python, selenium and pandas were my weapons of choice. The cleaning and transformations were done using SQL. Then it was time to create the visualisations, where I used Tableau. The result is an interactive dashboard, updated weekly, showing the statistics you want to know about your favourite players and teams this season.

2. Data Collection

Collecting the data. This is where the idea for the project started. To learn about web scraping. I built python classes, containing multiple functions. These classes are for getting, table stats, team stats, and player stats from the Premier League website. To explain what I did, I will choose LeagueTableScraper() as an example from league_table_class.py.

2.1 Imports

Firstly let's import the modules we need. We need 'webdriver' from selenium, to work with the website browser. 'By' from Selenium to locate html elements. The 'By' import uses different ways to locate these elements. the one's I've used are 'CSS_SELECTOR', 'TAG_NAME', 'ID' and 'CLASS_NAME'. Lastly the 'time' module. This is used to allow html elements to load, through delaying the execution of the next piece of code.

Imports

2.2 Configuring the class

To configure the class, we give a name for it. Mine being 'LeagueTableScraper'. I then decided to store the data within the class, hence creating a dictionary (self.premier_league) that contains an element with an empty array ('premier_league': []) where the data can get appended. This will allow me to loop through pages, append the data, and then return what we have collected at the end.

ConfiguringTheClass

2.3 The scraper function

I want to call only one function to scrape. To do all the scraping. That is my function called scrape_data(). This will take two parameters. One being the season (season_text) we want to scrape. The second being the initial url to open.

scrape_data

Let's break this down to get to know this function better.

2.3a Start the chrome session

The first line of the function creates a variable called 'driver'. This variable starts my chrome browser session from the webdriver.

StartChromeSession

2.3b Try and get URL

Now we have our chrome driver. We want to try and go to the url passed into the function. To do this we use the 'get' method from the driver. I have placed this inside of a try block. If if fails for any reason, the except part will print the error. Finally after the try and except, we stop the chrome sesson using the 'quit' method.

TryGetUrl

2.3c Handling blockers

At the moment we are just testing the url. I reviewed and paid close attention. We got pop ups. We got ads. To handle this I located the html elements and clicked on the closed buttons, which formed the function handle_blockers. This function needs to be run after we get the url. However we need the page to load before executing this function. So we use the time module. Now inside our try block, it looks like this.

TryWithHandleBlocks

I found only two blockers that occur on the website. So within the handle_blockers function I have two try except blocks. For each, when the blocker is present, the try section locates the close button, and clicks it. When it's not present, the except section does a 'pass', it moves on.

HandleBlockers

2.3d Get the data

Now were at the points we want to start getting the data, and storing it for collection. To do this, I've creating a function called get_all_premier_leagues_data(). This function takes the 'season_text' to know which season to get data for. The second parameter it takes is the 'driver', to access page elements. So inside of the current try block and under the handle_blockers function we put another try block. This is to handle errors when getting the data.

GetData

Let's take a closer look at this function.

Firstly in the get_all_premier_leagues_data() function we need to get the seasons to filter. The open_season_dropdown() is a function I've written to find and open the season dropdown. The find_seasons() function returns a list of the seasons available.

StartGetPremierLeaguesData

Now we can loop through these seasons. This is where season_text is used. If the user gives season_text as 'All'. We collect league table data for all seasons going back to 1992. If they specify a season like '2012/13'. We only collect league table data for that season.

The count is used when season_text is 'All'. It allows us to reopen the dropdown after the first one, as initially the dropdown has been opened before the loop.

So when season_text is 'All'. We loop through all the individual season filters, ignoring the 'All seasons' filter. We want the specific data for each season, not the sum of all seasons. After we click a season, we get the data and append it to our dictionary using another function, get_premier_league_data().

season_text_all

For season_text being an individual season like '2012/13'. We check if the season dropdown we are looping through is the same as '2012/13'. When that is true, the filter is clicked. The data is collected and appended using get_premier_league_data(). Finally we break out of the season loop, because we only wanted that one season.

season_text_individual

So your now asking yourself. What's inside of get_premier_league_data(). We'll I'm here to tell you.

Once we have clicked a season to filter, the league table is updated. Ready for us to collect stats. Essentially the function locates the body where the teams are located. Each team is within an element that uses their position in the league. As the premier league always has 20 positions, we can loop through each of these numbers. For each position, we get the element of the team in that position. this is done using the function locate_team_tags(). We can use these elements returned to get our data. Other than the team name and season, we use the get_team_info() function to retreive our information. Now that we have all the stats, it's time to append to our dictionary. Using append_data(). which take the parameters of the data points.

get_data_pl

The append function. It looks like this. Quite simple.

append

Lastly in the scrape_data() function, we return the data.

return_data

All that's left to do now is, initialise our class, and call our function. Allow the function to finish running, and boom. Our data is collected.

boom

3. Data Cleaning

I could have cleaned the data in python. But where is the fun in that. Why not use SQL, and test my skills. Let's looks at the Player Stats 2324 (Data Clean) file. This file is used to clean the player stats data after importing the data collected into SQL server. Now before running SQL code to clean this data and save it. We have to check if the cleaned file already exists, so that we can replace it, or create it. That looks like this, where our SELECT statment for cleaning goes within the '' in EXEC('').

replace_playerstats_SQL

The select statement for the cleaning. In this code we join new columns, replace values, amend values and specify data types. As an example look at Date_of_Birth. The values in the data are either '20/12/1995' or '20/12/1995 (28)' where 28 is the age. I just want the date. This calls for the CAST method. Which allows us to do a WHEN, THEN and ELSE statement (like an if, else statement). So within the WHEN, we check for '(' and get the index. If the index is greater than 0, the age is present. We use THEN to make the change. Take the substring of Date_of_Birth from index 1 to the index before '(' is present. Then CONVERT to the specified data type.

player_stats_clean_SQL

4. Data Transformation

Our data is nice and clean now. Ready for some transformations using SQL. You can find the transformation queries here - Data Export Queries. These queries help summarise statistics to be used in the dashboard. Here is an example.

Height (cm) vs Features query. This query needed a transformation of the data to allow for filtering in Tableau. Instead of having the data like this:

HeightFeaturesQueryDataBefore

I wanted it like this:

HeightFeaturesQueryDataAfter

To do that, this is the SQL query I wrote. As a side note. I found that when the Team is NULL, the player is no longer in the league. As you can see in the query too.

HeightFeaturesQuery

5. Dashboard building

Now I've got my data tranformations. The tables are exported. Uploaded to Tableau desktop. The main data table that everything goes through, for filtering purposes is 'Player_and_Team PL 2324.csv'. This table is a distict list of all players in the current Premier League season, with their teams and positons. The 'Team Stats PL 2324.csv' is going through 'Teams PL 2324.csv'. This is because the chart related to 'Team Stats PL 2324.csv' needs to go through a table of distinct teams. Which is 'Teams PL 2324.csv'.

TableauRelationships

I add the position, team and player columns from 'Player_and_Team PL 2324.csv' in this data to filter each chart. These columns are also added to the context of each chart, to enable the data to be fitlered by these columns before building the charts.

ContectTableau

After some trial and error. More filtering. More designing. It's finished. The dashboard has been built. Most of the charts can also be used as a filter too, so feel free to click away.

Premier League Dashboard Tableau

6. Future Conciderations

Going forward, if it was to advance this dashboard. I would scrape not only players in the 2023/2024 season. But in every season in the premier league. This will allow the dashboard to contain players like, Cristiano Ronaldo and Gareth Bale. The comparison between players in different season would then be quite interesting to add and look into. How players have changed over the seasons.

Another data source I would like to add is the history of injuries for the players. This would allow for some statistical calculations. Predicting the risk of injury for a player.

7. Conclusion

In conclusion, my journey into web scraping was a great learning experience. Instead of traditional methods like certifications or textbooks, I opted for a project. Driven by my passion for football. Using Python, Selenium, and Pandas, I collected over 2000 lines of data from the Premier League website. Cleaning and transforming the data was made easy with SQL. Then, with Tableau, I crafted an interactive dashboard showcasing insights into players and teams statistics. This project improved my technical skills, and enhanced my experience.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published