Skip to content

Commit

Permalink
Update man pages and other docs (#112)
Browse files Browse the repository at this point in the history
  • Loading branch information
dharple committed Mar 31, 2024
1 parent 35031a0 commit 56246a7
Show file tree
Hide file tree
Showing 10 changed files with 43 additions and 46 deletions.
3 changes: 2 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -29,7 +29,8 @@ into the ASCII character space. The focus will be on truly problematic
characters.

Older releases and version-specific branches are still available if you need
that functionality.
that functionality. During this transition, the old tables are also available
in `table/legacy/`

---

Expand Down
28 changes: 13 additions & 15 deletions man/detox.1
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
.\" For the full copyright and license information, please view the LICENSE
.\" file that was distributed with this source code.
.\"
.Dd February 24, 2021
.Dd March 31, 2024
.Dt DETOX 1
.Os
.Sh NAME
Expand All @@ -32,12 +32,12 @@
.Sh DESCRIPTION
The
.Nm
utility renames files to make them easier to work with under Unix and Unix-like
operating systems.
utility renames files to make them easier to work with under Linux and other
Unix-like operating systems.
It replaces characters that make it hard to type out a filename with dashes and
underscores.
It also provides transliteration-based filters, converting ISO 8859-1 or UTF-8
to ASCII, in part or in whole.
It also provides transcoding-based filters, converting ISO-8859-1 or CP-1252 to
UTF-8.
An additional filter unescapes CGI-escaped filenames.
.Ss Sequences
.Nm
Expand All @@ -55,8 +55,8 @@ filters.
Other examples of pre-configured sequences are
.Ar iso8859_1
and
.Ar utf_8 ,
which both provide transliteration to ASCII and then finish with the
.Ar iso8859_1-legacy ,
which both provide transcoding to UTF-8, and then finish with the
.Ar safe
and
.Ar wipeup
Expand Down Expand Up @@ -125,16 +125,14 @@ unless
.Fl f
has been specified, in which case, it is ignored.
.It Pa /usr/share/detox/cp1252.tbl
The provided CP-1252 transliteration table.
The provided CP-1252 transcoding table.
.It Pa /usr/share/detox/iso8859_1.tbl
The provided ISO 8859-1 transliteration table.
The provided ISO-8859-1 transcoding table.
.It Pa /usr/share/detox/safe.tbl
The provided safe character translation table.
.It Pa /usr/share/detox/unicode.tbl
The provided Unicode transliteration table, used by the UTF-8 filter.
.It Pa /usr/share/detox/unidecode.tbl
An additional Unicode tranlsiteration table, based on
.Xr Text::Unidecode 3pm .
The provided Unicode control character filtering table, used by the UTF-8
filter.
.El
.Sh EXAMPLES
.Bl -tag -width Fl
Expand All @@ -151,7 +149,6 @@ showing their filters and options.
.El
.Sh SEE ALSO
.Xr inline-detox 1 ,
.Xr Text::Unidecode 3pm ,
.Xr detox.tbl 5 ,
.Xr detoxrc 5 ,
.Xr ascii 7 ,
Expand All @@ -172,7 +169,8 @@ I created
to clean up these files.
.Pp
Version 2.0 stepped back from transliteration out of the box, instead focusing
on ease of use.
on ease of use. Version 3.0 further shifted this, by removing most of the
transliteration from the tables.
The primary motivations for this were user-provided feedback, and the fact that
many modern Unix-like OSs use UTF-8 as their primary character set.
Transliterating from UTF-8 to ASCII in this scenario is lossy and pointless.
Expand Down
Binary file modified man/detox.1.pdf
Binary file not shown.
2 changes: 1 addition & 1 deletion man/detox.tbl.5
Original file line number Diff line number Diff line change
Expand Up @@ -15,7 +15,7 @@
.Xr detox 1
.Sh OVERVIEW
.Cm detox
allows for configuration of how the safe, ISO 8859-1, and UTF-8 (Unicode)
allows for configuration of how the safe, ISO-8859-1, and UTF-8 (Unicode)
filters operate.
Through text-based translation tables, it is possible to tune how these
character sets are interpreted.
Expand Down
Binary file modified man/detox.tbl.5.pdf
Binary file not shown.
24 changes: 12 additions & 12 deletions man/detoxrc.5
Original file line number Diff line number Diff line change
Expand Up @@ -81,8 +81,8 @@ block.
.It Cm iso8859_1 ;
.It Cm iso8859_1 Bro Cm builtin Qo Ar name Qc ; Brc ;
.It Cm iso8859_1 Bro Cm filename Qo Ar /path/to/filename Qc ; Brc ;
This transliterates ISO 8859-1 characters between 0xA0 and 0xFF into lower
ASCII equivalents.
This transcodes ISO-8859-1 characters between 0xA0 and 0xFF into their UTF-8
equivalents, with a few exceptions.
The output is not necessarily safe, and should also be run through the
.Ar safe
filter.
Expand All @@ -95,7 +95,7 @@ Under normal circumstances, the filename syntax is not needed.
.Cm detox
looks in several locations for a file called
.Pa iso8859_1.tbl ,
which is a set of rules defining how an ISO 8859-1 character should be
which is a set of rules defining how an ISO-8859-1 character should be
translated.
If
.Cm detox
Expand All @@ -118,8 +118,7 @@ filter.
.It Cm utf_8 ;
.It Cm utf_8 Bro Cm builtin Qo Ar name Qc ; Brc ;
.It Cm utf_8 Bro Cm filename Qo Ar /path/to/filename Qc ; Brc ;
This transliterations Unicode characters, encoded using UTF-8, into lower ASCII
equivalents.
This filters Unicode control characters, encoded using UTF-8.
.Pp
This operates in a manner similar to
.Ar iso8859_1 ,
Expand Down Expand Up @@ -179,22 +178,23 @@ It only works on ASCII characters.
.Sh BUILTIN TABLES
.Bl -tag -width 0.25i
.It cp1252
A translation table for transliterating CP-1252 characters to ASCII.
A translation table for transcoding CP-1252 characters to UTF-8, with a few
exceptions.
This is no longer a common use case, and has been moved to a separate table.
.It iso8859_1
A translation table for transliterating single-byte characters with the high
bit set from ISO 8859-1 to ASCII.
A translation table for transcoding single-byte characters with the high bit
set from ISO-8859-1 to UTF-8.
.It safe
A replacement table for characters that are hard to work with under Unix and
Unix-like OSs.
.It unicode
A translation table for transliterating multi-byte characters encoded in UTF-8
to ASCII.
A translation table for converting multi-byte control characters encoded in
UTF-8 to safe characters.
.El
.Sh EXAMPLES
.Bd -literal
.\" START SAMPLE
# transliterate UTF-8 to ASCII (using chained tables), clean up
# filter UTF-8 control characters to ASCII (using chained tables), clean up
sequence utf8 {
utf_8 {
filename "/usr/local/share/detox/custom.tbl";
Expand All @@ -212,7 +212,7 @@ sequence utf8 {
length 128;
};
};
# decode CGI, transliterate CP-1252 to ASCII, clean up
# decode CGI, transcode CP-1252 to UTF-8, clean up
sequence "cgi-cp1252" {
uncgi;
iso8859_1 {
Expand Down
Binary file modified man/detoxrc.5.pdf
Binary file not shown.
28 changes: 13 additions & 15 deletions man/inline-detox.1
Original file line number Diff line number Diff line change
Expand Up @@ -6,7 +6,7 @@
.\" For the full copyright and license information, please view the LICENSE
.\" file that was distributed with this source code.
.\"
.Dd February 24, 2021
.Dd March 31, 2024
.Dt INLINE-DETOX 1
.Os
.Sh NAME
Expand All @@ -33,12 +33,12 @@
.Sh DESCRIPTION
The
.Nm
utility generates new filenames to make them easier to work with under Unix and
Unix-like operating systems.
utility generates new filenames to make them easier to work with under Linux
and other Unix-like operating systems.
It replaces characters that make it hard to type out a filename with dashes and
underscores.
It also provides transliteration-based filters, converting ISO 8859-1 or UTF-8
to ASCII, in part or in whole.
It also provides transcoding-based filters, converting ISO-8859-1 or CP-1252 to
UTF-8.
An additional filter unescapes CGI-escaped filenames.
.Pp
.Nm
Expand Down Expand Up @@ -70,8 +70,8 @@ filters.
Other examples of pre-configured sequences are
.Ar iso8859_1
and
.Ar utf_8 ,
which both provide transliteration to ASCII and then finish with the
.Ar iso8859_1-legacy ,
which both provide transcoding to UTF-8, and then finish with the
.Ar safe
and
.Ar wipeup
Expand Down Expand Up @@ -115,16 +115,14 @@ unless
.Fl f
has been specified, in which case, it is ignored.
.It Pa /usr/share/detox/cp1252.tbl
The provided CP-1252 transliteration table.
The provided CP-1252 transcoding table.
.It Pa /usr/share/detox/iso8859_1.tbl
The provided ISO 8859-1 transliteration table.
The provided ISO-8859-1 transcoding table.
.It Pa /usr/share/detox/safe.tbl
The provided safe character translation table.
.It Pa /usr/share/detox/unicode.tbl
The provided Unicode transliteration table, used by the UTF-8 filter.
.It Pa /usr/share/detox/unidecode.tbl
An additional Unicode tranlsiteration table, based on
.Xr Text::Unidecode 3pm .
The provided Unicode control character filtering table, used by the UTF-8
filter.
.El
.Sh EXAMPLES
.Bl -tag -width Fl
Expand All @@ -135,7 +133,6 @@ listing any changes and returning the result to the output stream.
.El
.Sh SEE ALSO
.Xr detox 1 ,
.Xr Text::Unidecode 3pm ,
.Xr detox.tbl 5 ,
.Xr detoxrc 5 ,
.Xr ascii 7 ,
Expand All @@ -156,7 +153,8 @@ I created
to clean up these files.
.Pp
Version 2.0 stepped back from transliteration out of the box, instead focusing
on ease of use.
on ease of use. Version 3.0 further shifted this, by removing most of the
transliteration from the tables.
The primary motivations for this were user-provided feedback, and the fact that
many modern Unix-like OSs use UTF-8 as their primary character set.
Transliterating from UTF-8 to ASCII in this scenario is lossy and pointless.
Expand Down
Binary file modified man/inline-detox.1.pdf
Binary file not shown.
4 changes: 2 additions & 2 deletions tests/legacy/man-page-example/detoxrc.detoxrc.5
Original file line number Diff line number Diff line change
@@ -1,5 +1,5 @@
# START SAMPLE
# transliterate UTF-8 to ASCII (using chained tables), clean up
# filter UTF-8 control characters to ASCII (using chained tables), clean up
sequence utf8 {
utf_8 {
filename "/usr/local/share/detox/custom.tbl";
Expand All @@ -17,7 +17,7 @@ sequence utf8 {
length 128;
};
};
# decode CGI, transliterate CP-1252 to ASCII, clean up
# decode CGI, transcode CP-1252 to UTF-8, clean up
sequence "cgi-cp1252" {
uncgi;
iso8859_1 {
Expand Down

0 comments on commit 56246a7

Please sign in to comment.