💾 datahoarder-website-to-markdown 🏴‍☠️

Description ⚡

The script takes a cookie and a list of forum/webpage indexes as input, then it scrapes all urls from the indexes and download the associated pages (html). All html files are converted to lightweight markdown pages (~15-20Kb), then they are trimmed (the sed trimming parameters must be edited because they are different from website to website) and saved in folders called as the index (read the list at the top of the script). All the scraped contents are uploaded to a remote git repository (you can store the git credentials by configuring git, so you can make the whole process automatic).

forums with "click Like to show the thread" are supported by this script
if there is a connection error or the website blocks the scraping, the script can be resumed without losing the previously scraped files
deleted files are moved to trashbin (I don't use rm but gio trash)
the script must be edited in order to be correctly executed

Screens 🖼️

Dependences 📜

html2md
crawley
curl
rsync
git

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
LICENSE		LICENSE
README.md		README.md
backup-website.sh		backup-website.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

💾 datahoarder-website-to-markdown 🏴‍☠️

Description ⚡

Screens 🖼️

Dependences 📜

About

Releases

Packages

Languages

License

evilsh3ll/datahoarder-website-to-markdown

Folders and files

Latest commit

History

Repository files navigation

💾 datahoarder-website-to-markdown 🏴‍☠️

Description ⚡

Screens 🖼️

Dependences 📜

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages