This project is a web scraper designed to extract information from Immobilienscout24 property listings. It uses Playwright for browser automation and handles cookie management for seamless operation.
- Scrapes detailed information from Immobilienscout24 property listings
- Handles cookie management to maintain session
- Provides CAPTCHA detection and manual solving option
- Supports repeated scraping of the same listing for monitoring changes
- Python 3.10.11 (tested version)
- pip (Python package installer)
WebScrapingProject/
├── .idea/
├── venv/
├── __pycache__/
├── config.py
├── cookie-saver.py
├── main.py
├── requirements.txt
└── cookies.json (generated after running cookie-saver.py)
-
Clone this repository:
git clone <repository-url> cd WebScrapingProject
-
Create a virtual environment:
python -m venv venv
-
Activate the virtual environment:
- On Windows:
venv\Scripts\activate
- On macOS and Linux:
source venv/bin/activate
- On Windows:
-
Install the required packages:
pip install -r requirements.txt
-
Install Playwright browsers:
playwright install
-
First, run the cookie-saver script to set up your session:
python cookie-saver.py
Follow the prompts to manually accept cookie terms in the browser window.
-
Update the
TARGET_URL
inconfig.py
with the Immobilienscout24 listing URL you want to scrape. -
Run the main scraper:
python main.py
-
The script will scrape the listing and display the results. Press Enter to scrape again or 'q' to quit.
You can modify the FIELDS_TO_FETCH
dictionary in config.py
to adjust which fields are scraped from the listing.
- If you encounter a CAPTCHA, the script will pause and allow you to solve it manually.
- In case of errors, the script will save a screenshot as 'error_screenshot.png' for debugging.
Contributions are welcome! Please feel free to submit a Pull Request.
This project is open source and available under the MIT License.
This scraper is for educational purposes only. Always respect the terms of service of the websites you interact with.