Web Archive Scraper

Overview 💻

Web Archive Scraper is a Python script designed to fetch archived web pages from the Wayback Machine. The script allows users to extract titles and other relevant metadata of archived web pages based on specified file extensions.

Features 🤠

Fetch archived URLs from the Wayback Machine.
Filter results based on file extensions (e.g., .php, .html).
Extract and display the title, status code, and content length of each archived page.
Save the results in both .txt and .html formats.

Usage 💥

To run the script, use the following command:

python was.py -l <linkfile> -e <extensions>

python was.py -l liveurls.txt -e php html

Output 💾

Text File: All fetched URLs are saved in a urls.txt file.

HTML File: The script generates a results.html file containing a table with the following columns: URL Title Status Code Content Length

Installation 💿

Clone the repository :

git clone https://github.com/husseinphp/web-archive-scraper.git
cd web-archive-scraper

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
README.md		README.md
requirements.txt		requirements.txt
was.py		was.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Web Archive Scraper

Overview 💻

Features 🤠

Usage 💥

Output 💾

Installation 💿

About

Uh oh!

Releases

Packages

Uh oh!

Languages

husseinphp/Web-Archive-Scraper

Folders and files

Latest commit

History

Repository files navigation

Web Archive Scraper

Overview 💻

Features 🤠

Usage 💥

Output 💾

Installation 💿

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages