- Scrape number of profiles that exist in result of Linkedin searchUrl.
- Export the content of profiles to Excel and Json files.
- Use the package manager pip to install Scrapy.
(Anaconda Recomended)
cd LinkedinScraperProject
pip install -r requirements.txt
- clone the project
git clone https://github.com/khaleddallah/GoogleImageScrapyDownloader.git
- get into the directory of the project:
cd LinkedinScraperProject
- to get help :
python LinkedinScraper -h
usage: python LinkedinScraper [-h] [-n NUM] [-o OUTPUT] [-p] [-f format] [-m excelMode] (searchUrl or profilesUrl) positional arguments: searchUrl URL of Linkedin search URL or Profiles URL optional arguments: -h, --help show this help message and exit -n NUM num of profiles ** the number must be lower or equal of result number 'page' will parse profiles of url page (10 profiles) (Default) -o OUTPUT Output file -p Enable Parse Profiles -f FORMAT json Json output file excel Excel file output all Json and Excel output files -m EXCELMODE 1 to make each profile in Excel file appear in one row m to make each profile in Excel file appear in multi row
- Parse ( https://www.linkedin.com/in/khaled-dallah/ and https://www.linkedin.com/in/linustorvalds/ ) profiles and export the result content to ABC.xlsx and ABC.json
(-p) because of parsing single profiles
python LinkedinScraper -p -o 'ABC' 'https://www.linkedin.com/in/khaled-dallah/' 'https://www.linkedin.com/in/linustorvalds/'
- Parse 23 profiles of searchUrl https://www.linkedin.com/.../?keywords=Robotic&...&
if you don't set output name by (-o), Name of result files will be value of keywords (Robotic)
python LinkedinScraper -n 23 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
- Parse 17 profiles of searchUrl https://www.linkedin.com/.../?keywords=Robotic&...&
and get output as excel file and put the information of each profile in one row
python LinkedinScraper -n 17 -f excel -m 1 'https://www.linkedin.com/search/results/all/?keywords=Robotic&origin=GLOBAL_SEARCH_HEADER'
- Python 3.7
- Scrapy
- openpyxl
- Khaled Dallah - Software Engineer | Python/c++ Developer
khaled.dallah0@gmail.com
Report bugs and feature requests here.
Contributions are always welcome!
This project is licensed under the LGPL-V3.0 License - see the LICENSE.md file for details