The Google Patents Image Scraper is a versatile Python tool that automates the downloading of patent images from Google Patents. Designed to support both GUI and non-GUI interactions within a single application, it caters to a wide range of users, from those who prefer a visual interface to those who need the efficiency of command-line execution.
-
Unified GUI and Non-GUI Functionality: Seamlessly switch between a Graphical User Interface and a Command-Line Interface within the same application.
-
Robust Error Handling: The tool is equipped with error handling to manage and retry after exceptions, ensuring reliability and stability.
-
Progress Summary Generation: After completion, the tool generates a summary of the scraping session, detailing the number of images successfully downloaded, any skipped or failed attempts, and other relevant metrics.
The tool works in the following way:
- Upon launch, users can choose to operate in GUI mode or proceed with non-GUI mode.
- The scraper navigates to Google Patents, searches for specified patents, and retrieves image URLs.
- Images are then downloaded and saved to a specified directory.
- Throughout the scraping process, progress updates are displayed, either within the GUI or in the command line.
- A summary report is provided at the end, summarizing the session's outcomes.
Follow these steps to use the Google Patents Image Scraper:
- Clone this repository to your local machine.
- Install the required dependencies from
requirements.txt
. - Execute the scraper file
Automation.ipynb
- Attach Patent Numbers as an input and begin the scraping process.
Feedback and contributions are highly appreciated. If you'd like to contribute or suggest improvements, please fork the repository, push your changes, and create a pull request. For larger changes or feature suggestions, please open an issue first to discuss what you would like to change.
Please star the project if you find it helpful!