Python Web Scraper

This project is a Python-based web scraper designed to scrape data from web pages using Selenium WebDriver. The scraper can handle pagination and retrieve elements based on class names or IDs. It also supports configuration through a JSON file for flexible usage.

Features

Scrapes data from web pages based on class names or IDs
Handles paginated web pages
Configurable through a JSON file
Supports headless mode for background execution
Parallel processing for faster data retrieval
Logs errors and progress

Prerequisites

Python 3.x
Google Chrome browser
ChromeDriver compatible with your version of Chrome
Required Python packages (see below)

Installation

Clone the repository:

git clone https://github.com/gtrtuugii/python-web-scraper.git
cd python-web-scraper

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the required Python packages:
```
pip install -r requirements.txt
```
Download ChromeDriver and place it in your PATH or specify its path in the config.json file.

Configuration

The scraper uses a configuration file config.json to set various parameters. An example configuration is provided below:

{
    "driver_path": "path/to/chromedriver",
    "implicit_wait_time": 10,
    "base_url": "https://www.playhq.com/basketball-victoria/org/melbourne-central-basketball-association/sunday-cyms-senior-domestic-summer-202324/sunday-senior-men-a/a112a9d0/R{}",
    "pagination_pattern": "{}",
    "output_file": "output.csv",
    "headless": true
}

Run the scraper

python webscraper.py

Name		Name	Last commit message	Last commit date
Latest commit History 7 Commits
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
README.md		README.md
config.json		config.json
playhq.py		playhq.py
requirements.txt		requirements.txt
webscraper.py		webscraper.py
webv1.py		webv1.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Python Web Scraper

Features

Prerequisites

Installation

Configuration

Run the scraper

About

Releases

Packages

Languages

gtrtuugii/py-webscraper

Folders and files

Latest commit

History

Repository files navigation

Python Web Scraper

Features

Prerequisites

Installation

Configuration

Run the scraper

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages