Skip to content

Will download HTML of URL provided at runtime, will also parse html extract embedded links and download associated html.

Notifications You must be signed in to change notification settings

Smullle/WebArchiver

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

10 Commits
 
 
 
 
 
 

Repository files navigation

WebArchiver

Will download HTML of URL provided at runtime, will also parse html extract embedded links and download associated html.

  • requests library used to GET HTML and parse embeded links
  • Will normalise a url by removing and non alphanumeric chars and replace with _ also remove http:// eg. https://python.org/ => python_org_
  • Save main url and embeded urls as .html and produce lookup.json to reconstruct urls from normalised filenames.



About

Will download HTML of URL provided at runtime, will also parse html extract embedded links and download associated html.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages