Python web scraper for scraping jobs info from Indeed.com link: https://www.indeed.com
-
Install Intellij IDEA Community version 2020.3.1 or visual studio code Version: 1.52.1 which I use the two. It is up to your personal preference.
-
I uses windows's version python 3.8.0 64 bit link: https://www.python.org/downloads/release/python-380/
-
i) Intellij IDEA: download python plugins by downloading it within Intellij. ii) VScode: download python extensions.
-
If you are familiar of setting virtual environment for each project, you can set up virtual environment. Intellij IDEA has it own way to set up virtual environment when you set up a project. Virtual environment needed to be set up using terminal with VScode.
-
Install pip and packages listed in requirement.txt with the its recpective versions.
-
Within .vscode, the settings.json is meant for my specific virtual environment. If you are not planning to use virtual environment, then just make use the path of the python is your default location of python.exe (global python location)
-
The .idea and Indeed-jobs_scraping.iml are meant for Intellij IDEA.
- At codeline 79: you will need to specify the path you intend to save csv file of the quotes scraped from the python file. You only need to change to path directory within path = 'C:\Users\Hubert\Desktop\'. The back portion is set up for csv file naming unless you want to change the naming format.
-
You will be prompted with inputs position, city, state, radius from the approximated city, and pages you want to scrape.
-
Make sure to type positions with space if there is any between. Example: (mechanical engineer or real estate agent)
-
Make sure city and state is a valid input. Example: (Boston, Ma or San Francisco, CA)
-
jobs_data_scientist.csv and jobs_software_engineer.csv are examples csv.
-
The output will vary depending on the time when job is updated frequently.