Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add back support for "-utf8" flag for backwards compatibility #52

Closed
thp opened this issue Aug 29, 2022 · 1 comment
Closed

Add back support for "-utf8" flag for backwards compatibility #52

thp opened this issue Aug 29, 2022 · 1 comment

Comments

@thp
Copy link

thp commented Aug 29, 2022

We historically used -utf8 with older versions of html2text, but the new version defaulted to UTF-8 by default, and does not accept -utf8 as command-line argument anymore.

thp/urlwatch#718

Version 2.1.1 help output:

% html2text -help
This is html2text, version 2.1.1

Usage:
  html2text -help
  html2text -version
  html2text [ -check ] [ -debug-scanner ] [ -debug-parser ] \
     [ -rcfile <file> ] [ -width <w> ] [ -nobs ] [ -links ]\
     [ -from_encoding ] [ -to_encoding ] [ -ascii ]\
     [ -o <file> ] [ <input-file> ] ...
Formats HTML document(s) read from <input-file> or STDIN and generates ASCII
text.
  -help          Print this text and exit
  -version       Print program version and copyright notice
  -check         Do syntax checking only
  -debug-scanner Report parsed tokens on STDERR (debugging)
  -debug-parser  Report parser activity on STDERR (debugging)
  -rcfile <file> Read <file> instead of "$HOME/.html2textrc"
  -width <w>     Optimize for screen widths other than 79
  -nobs          Do not render boldface and underlining (using backspaces)
  -links         Generate reference list with link targets
  -from_encoding Treat input encoded as given encoding
  -to_encoding   Output using given encoding
  -ascii         Use plain ASCII for output instead of UTF-8
                 alias for: -to_encoding ASCII//TRANSLIT 
  -o <file>      Redirect output into <file>

Old version help:

$ html2text -help
This is html2text, version 1.3.2a

Usage:
  html2text -help
  html2text -version
  html2text [ -unparse | -check ] [ -debug-scanner ] [ -debug-parser ] \
     [ -rcfile <file> ] [ -style ( compact | pretty ) ] [ -width <w> ] \
     [ -o <file> ] [ -nobs ] [ -ascii | -utf8 ] [ <input-url> ] ...
Formats HTML document(s) read from <input-url> or STDIN and generates ASCII
text.
  -help          Print this text and exit
  -version       Print program version and copyright notice
  -unparse       Generate HTML instead of ASCII output
  -check         Do syntax checking only
  -debug-scanner Report parsed tokens on STDERR (debugging)
  -debug-parser  Report parser activity on STDERR (debugging)
  -rcfile <file> Read <file> instead of "$HOME/.html2textrc"
  -style compact Create a "compact" output format (default)
  -style pretty  Insert some vertical space for nicer output
  -width <w>     Optimize for screen widths other than 79
  -o <file>      Redirect output into <file>
  -nobs          Do not use backspaces for boldface and underlining
  -ascii         Use plain ASCII for output instead of ISO-8859-1
  -utf8          Assume both terminal and input stream are in UTF-8 mode
  -nometa        Don't try to recode input using 'meta' tag

It might have been nice to keep supporting -utf8 (maybe even unlisted in the -help output) as a no-op (as the default is UTF-8) so that existing scripts using html2text can work with both versions.

For now, I worked around this by first feature-checking -utf8 via -help's output and then either adding it or leaving it out.

@grobian
Copy link
Owner

grobian commented Sep 23, 2022

I'm not in favour of this, but it really is a minimal effort if it makes other people happy, so I pushed this.

Thanks!

@grobian grobian closed this as completed Sep 23, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants