-
Notifications
You must be signed in to change notification settings - Fork 0
rplessl/wk2txt
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
NAME wk2txt - A Website to Text Converter SYNOPSIS wk2txt *options* Options: --input URL convert *URL* to text --urlfile file convert URLs in *file* to text --output directory save output in directory *directory* --xml dump in XML format --timeout inverval timeout after *interval* seconds (NOT IMPLEMENTED) --debug print diagnostic information --help print help on options --version print version number DESCRIPTION wk2txt is a command line tool based on the WebKit Engine to get the text content of a websites as seen by current webbrowsers which includes the evaluation of javascript code present in many modern webpages. wk2txt can process a whole list of URLs, which can be built by using the "--input" option multiple times or by passing the filename of a list of URLs to the "--urlfile". This file is expected to have 1 URL per line. The contents of the webpages is written to one file per webpage. The files are stored in the directory specified by the "--output" option. The filename is derived from the URL, it is the SHA1 hash code for the URL. By default, wk2txt dumps the contents and the meta data of the webpage in plain text format. If the "--xml" option is used, content and meta data is dumped in XML format. AUTHOR Christian Plessl <christian.plessl@oetiker.ch> Roman Plessl <roman.plessl@oetiker.ch>
About
a website to text converter using webkit
Resources
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published