Local Python app capable of scraping a LinkedIn profile and storing it to train a ML model
The project's main objective is to develop an automated tool to analyze LinkedIn profiles and determine whether they match a certain position. The project consists of the following main files:
-
app.py: This is the main app. Once executed, the program will open LinkedIn's main page to log in with one's account. From the app's UI, the user will be able to scrape any LinkedIn profile given the profile's link.
-
scraper.py: This is where the scraper is coded and where the main logic is developed.
-
lang.txt: This file determines in which language the LinkedIn app will be. Write EN for English and ES for Español. Default language is Español.
-
jobs.txt: In this file the positions to match are stored. The user can edit this file freely without crashing the program (restart is required) and may label his own jobs. It is mandatory that the user follows the same style of the file (one job per line).
-
dataset.csv/backup_dataset.csv: In these files the scraped profiles and their labels will be stored. Each time the program is restarted, backup_dataset.csv will copy dataset.csv's data.
The program is pretty straightforward to use so I will not go into great depth about any tool:
-
Open the app.py file
-
Log in to your LinkedIn account once the main page is opened
-
Copy and paste the link of the LinkedIn profile that you wish to scrape
-
Select the position to which the person is going to be labeled
-
Select 'Match' or 'No match' depending on the profile's correspondence with the selected position
-
Press 'Scrape and save!' to scrape the profile and save it (only experience and aptitudes will be scraped)
-
The scraped information will be displayed on the app and saved to the dataset file
This tool is not meant to make any profit out of it and is not intended to copy or damage LinkedIn in any way. It is just a more human-friendly way of constructing a dataset for the incoming LLM era. It is suggested to use the tool under regulation and not spam the 'Scrape and save!' button frequently to avoid being banned from the platform. Thank you!!!