The Internet Archive Data Analyser is a powerful Streamlit-based web application that harnesses the capabilities of the Wayback Machine's CDX server. This tool provides comprehensive analysis and visualization of websites' historical evolution, offering valuable insights into their structural changes and performance over time.
- URL Retrieval: Efficiently fetches all archived URLs for a specified domain from the Internet Archive.
- Folder Structure Visualization: Dynamically displays the evolution of a website's folder structure across time.
- Status Code Analysis: Provides a detailed distribution of HTTP status codes throughout the site's history.
- Frequently Changed Pages: Identifies and lists the most frequently modified pages, offering insights into content updates.
- robots.txt Evolution: Tracks and visualizes changes to the site's robots.txt file, highlighting shifts in crawling policies.