mirror-webnode

This python script performs a mirror/copy of webpages that are listed on a provided file. This is intended to work better with WEBNODE.COM hosted sites. A copy of listed files are provided in the desired folder WITH the copy of the external resources (so it may be a bit cumbersome, tens of Megs). The output folder is automatically ZIP-compressed to a file named "PREFIX 20240222 webnode site.zip" where 202402 is current time in YYYYMMdd. Prefix must be specified by option.

Webnode is used in few high school textbooks as an example for web sites design tools. This script permits to store the student works for future reference.

You need: Python3 and the wget command.

Command usage:

usage: download.py [-h] filename name_prefix [dest_folder] [num_threads] [num_levels]

positional arguments:

  filename     Name of the text file containing the list of site URLs. One URL per row.
  name_prefix  Prefix to add to zip archive name, e.g. it can be one
               class name
  dest_folder  Optional, Cartella di destinazione per le immagini dei siti,
               default: 'mirror' folder in the current execution path
  num_threads  Optional, Number of threads, downloads N sites at same
               time.default: 4
  num_levels   Optional, Number of site levels to dig in (and external
               links/resources).default: 1

options:
  -h, --help   show this help message and exit

Issues:

The Copy is not Perfect, iframes may not be replicated, so you lost the integrated maps and so on.
It creates local copies of external resources BUT the ones that are dynamically loaded (are not)
I have removed the cookies preference panel from the downloaded HTMLs... it is rough but it works.
I have chosen resonable wget options, in case you find better options, please send pull requests or let's post an issue
YOU need wget in path, so it is a bit simpler to use under linux

Name		Name	Last commit message	Last commit date
Latest commit History 19 Commits
LICENSE		LICENSE
README.md		README.md
download.py		download.py
urls.txt		urls.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

mirror-webnode

About

Releases

Packages

Languages

License

ftarlao/mirror-webnode

Folders and files

Latest commit

History

Repository files navigation

mirror-webnode

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages