Skip to content

Web Archive Scraper is a Python script designed to retrieve archived web pages from the Wayback Machine. It allows you to extract the titles of archived web pages and other relevant data based on specified file extensions.

Notifications You must be signed in to change notification settings

husseinphp/Web-Archive-Scraper

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

5 Commits
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

Web Archive Scraper

Overview ๐Ÿ’ป

Web Archive Scraper is a Python script designed to fetch archived web pages from the Wayback Machine. The script allows users to extract titles and other relevant metadata of archived web pages based on specified file extensions.

Features ๐Ÿค 

  • Fetch archived URLs from the Wayback Machine.
  • Filter results based on file extensions (e.g., .php, .html).
  • Extract and display the title, status code, and content length of each archived page.
  • Save the results in both .txt and .html formats.

Usage ๐Ÿ’ฅ

To run the script, use the following command:

python was.py -l <linkfile> -e <extensions>

python was.py -l liveurls.txt -e php html

Output ๐Ÿ’พ

Text File: All fetched URLs are saved in a urls.txt file.

HTML File: The script generates a results.html file containing a table with the following columns: URL Title Status Code Content Length

Installation ๐Ÿ’ฟ

Clone the repository :

git clone https://github.com/husseinphp/web-archive-scraper.git
cd web-archive-scraper

About

Web Archive Scraper is a Python script designed to retrieve archived web pages from the Wayback Machine. It allows you to extract the titles of archived web pages and other relevant data based on specified file extensions.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages