Python Extract Websites: README

Project Description

Using Python scripts, I meticulously scraped 19 Expo websites, unearthing 26,453 company names. Crafting a tailored Google search, I navigated through quotas, uncovering elusive company websites. My async Python script then delved deeper, capturing essential details: Source, Company Name, Website, Contact Name(s), Contact Email(s), Contact Phone Number(s), and Social Media Accounts.

Step-by-Step Process

Step 1: Scraping Expo Websites

I utilized 19 different Python scripts employing Beautiful Soup and Requests libraries to scrape data from various Expo websites. This process yielded a total of 26,453 company names from these websites.

Step 2: Extracting Data and Saving to Excel

With the collected data from the Expo websites, I extracted pertinent details and saved them into an Excel file named company_details_from_expo.xlsx.

Step 3: Enhancing Data with Additional Contact Information

Over the next three days, I utilized a custom Google search engine to programmatically search for the websites of each company. Due to a detail quota limit of 10,000, this was done in batches.

Once I obtained the websites, I developed an asynchronous Python script to crawl each entire website for more comprehensive details, including:

Website Scrape Source
Company Name
Website
Contact Name(s)
Contact Email(s)
Contact Phone Number(s)
Social Media Accounts (Facebook, Instagram, LinkedIn, etc.)

Data Extraction in Chunks

The data was extracted in chunks of 1,000 asynchronously to ensure efficiency and manageability.

Output

The final data is stored in an Excel file with detailed contact information for each company.

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
Expo Company Steps.docx		Expo Company Steps.docx
README.md		README.md
company_details_from_expo.xlsx		company_details_from_expo.xlsx
expo_sites.xlsx		expo_sites.xlsx
reviews.png		reviews.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Extract Websites: README

Project Description

Step-by-Step Process

Step 1: Scraping Expo Websites

Step 2: Extracting Data and Saving to Excel

Step 3: Enhancing Data with Additional Contact Information

Data Extraction in Chunks

Output

About

Releases

Packages

kingtroga/extract-expo-companies

Folders and files

Latest commit

History

Repository files navigation

Python Extract Websites: README

Project Description

Step-by-Step Process

Step 1: Scraping Expo Websites

Step 2: Extracting Data and Saving to Excel

Step 3: Enhancing Data with Additional Contact Information

Data Extraction in Chunks

Output

About

Topics

Resources

Stars

Watchers

Forks

Releases

Packages 0

Packages