Skip to content

huangwentao123/football-data-collection

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

Collecting football data

Welcome !

This is an open source project aiming to provide tools for people to collect and format large set of data about football matches and players. The project is essentially a crawler written in Python and relies on two sources:

Using Scrapy

To facilitate the crawling, I use an open source python library called Scrapy. Have a look at the tutorials on their webpage if you're not already familiar with the lib.

Collection process

  • 1: collect the matches stats and team lineups using the Match Crawler
  • 2: build a list of unique player names
  • 3: loop this list with the Player Crawler. Create a list of the players you haven't successfully crawled and again follow the third step, adjusting the crawling paramaters. Repeat until you've got all the players you need.

Using Search Engines

Sometimes, a player name is rather complicated or not consistent accross different sources. To help identify a player, the algorithm can be parameterized to make use of search engines. Google is a prime choice thanks to its large database and tolerance to mispelling player names. Unfortunately, the Google API has a limited usage rate per day. Hence I suggest you use Yahoo or Bing first and only use Google for those players you stuggle to find.

About

Web Scraper used to create Kaggle European Soccer database

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • HTML 96.4%
  • Python 3.6%