A line-splitting (paragraph shaping) library for Perl.
Text::KnuthPlass uses the famed Knuth-Plass line-splitting algorithm (used in TeX and LaTeX) to break up a text string into a properly "shaped" paragraph. Certain rules are followed to not only efficiently pack the paragraph into a minimal number of lines, but also to minimize hyphenation, keep line density fairly constant, and take other measures to ensure that the output is typographically "nice looking". It works with both fixed-width fonts and with proportional (variable-width) fonts, where you supply the font library that calculates word lengths (e.g., PDF::Builder's advancewidth() method). Text::KnuthPlass permits varying line lengths, to allow text to flow around other objects, such as illustrations. It also makes use of (by default) Text::Hyphen, a library to indicate where words can be split (for hyphenation purposes).
See also this blog on Paragraph Shaping for a deeper dive into the subject.
Home Page, including Documentation and Examples.
Text::KnuthPlass
is a Perl and XS (C) implementation of the well-known
Knuth-Plass TeX paragraph-shaping (a.k.a. line-breaking) algorithm, as created
by Donald E. Knuth and Michael F. Plass in 1981.
Given a long string containing the text of a paragraph, this module decides where to split a line (possibly hyphenating a word in the process), while attempting to:
- maintain text "tightness" within a reasonable, comfortably readable, range (neither jammed together nor excessively loose).
- maintain fairly consistent text "tightness" (limited change from line to line).
- minimize the amount of hyphenation overall (words split at a line end).
What is a stated objective of Knuth-Plass but I don't think this implementation directly does:
- minimize the number of lines resulting.
- not have two or more hyphenated lines in a row.
- not have entire words "floating" over the next line (particularly when not fully justified, e.g., "ragged right").
- not hyphenate the paragraph's penultimate line.
What it definitely doesn't do:
- attempt to avoid widows and orphans. This is the job of the calling routine, as
Text::KnuthPlass
doesn't know how much of the paragraph fits on this page (or column) and how much has to be spilled to the next page or column. It simply splits up the entire paragraph, and leaves it to your code to render. - attempt to avoid hyphenating the last word of the last line of a split paragraph on a page or column (as before, it doesn't know where you're going to be splitting the paragraph between columns or pages).
- attempt to optimize over an entire page (it handles one paragraph at a time).
- avoid having the same word (or fragment) starting or ending two lines in a row (a "stack"). This is undesirable because it makes it easier to mistrack while reading, and accidentally skip up or down a line.
- avoid near-vertical "rivers" of whitespace.
- avoid a very short or single word last line (a "cub").
In spite of these limitations, the Knuth-Plass ("TeX line splitting") algorithm is still pretty much the gold standard for paragraph shaping.
The Knuth-Plass algorithm does this by defining "boxes", "glue", and "penalties" for the paragraph text, and fiddling with line break points to minimize the overall sum of demerits (a penalty value for various "bad typesetting" gaffes). This can result in the "breaking" of one or more of the listed rules, if it results in an overall better scoring ("better looking") layout.
Text::KnuthPlass
handles word widths by either character count, or a
user-supplied width function (such as based on the current font and font
size). It can also handle varying-length lines, if your column is not a
perfect rectangle (see examples).
perl Build.PL
./Build
./Build test
./Build install
Note that if the XS (C) code fails to build and install for some reason, or
you enjoy watching paint dry, you
can still run "pure Perl" code -- it's much slower, but will always run. In
lib/Text/KnuthPlass.pm, look for the flag setting
use constant purePerl => 0;
and change it to a value of 1
.
After installation, documentation can be found via
perldoc Text::KnuthPlass
or
pod2html lib/Text/KnuthPlass.pm > KnuthPlass.html
Bug tracking is via
"https://github.com/PhilterPaper/Text-KnuthPlass/issues?q=is%3Aissue+sort%3Aupdated-desc+is%3Aopen"
(you will need a GitHub account to create a ticket, or contribute to a discussion, but anyone can read tickets.) The old RT ticket system is closed.
Do NOT under ANY circumstances open a PR (Pull Request) to report a bug. It is a waste of both your and our time and effort. A PR is an offering of code that you think belongs permanently in the product. Instead, simply open a regular ticket (issue) in GitHub, and attach a Perl (.pl) program illustrating the problem, if possible. If you believe that you have a good program patch, and offer to share it as a PR, we may give the go-ahead. Unsolicited PRs may be closed without further action.
This product is licensed under the Perl license. You may redistribute under the GPL license, if desired, but you will have to add a copy of that license to your distribution, per its terms.
(c)copyright 2020-2023 by Phil M Perry; earlier copyrights held by Simon Cozens
Around 2009, Bram Stein wrote a Javascript implementation of the Knuth-Plass
paragraph fitting algorithm named typeset
(not to be confused with the
language typescript
, nor the publishing system Typeset
). It may be found
on GitHub in bramstein/typeset
, and does not appear to be maintained (last
update 2017). In 2011, Simon Cozens ported typeset
to Perl, and called it
Text::KnuthPlass
, maintaining it for only a short time. In 2020, Phil Perry
took over maintenance of this package.
Note: gitpan/Text-KnuthPlass (on GitHub) appears to be a Read-Only archive of Text::KnuthPlass from before Perry took over maintenance. It is old, and thus not very useful. See PhilterPaper/Text-KnuthPlass for the latest code.
There are many copies of the Knuth-Plass paper/thesis, as well as discussions and explanations of the algorithm, floating around on the Web, so I will leave it to you to find some examples. Just the keywords Knuth and Plass should get you there. Rather than my listing everything here, pay a visit to my discussion on the subject. There is also a list of criteria of what makes good paragraph shaping.
There is also a refactored (still Javascript) version of
typeset
, intended for use as a library, in frobnitzem/typeset
.
Finally, there are a
number of Knuth-Plass implementations in other languages, such as Python
(akuchling/texlib
) and typescript (avery-laird/breaker
) that could be
studied. And of course, there is the original Knuth-Plass paper and the
annotated listing in TeX: The Program. It's just a matter of finding the
time to go through all these sources and extend Text::KnuthPlass
in
various ways.
Find an example of using Text::KnuthPlass in examples/PDF/Flatland.pl
,
derived from the example in typeset. It
assumes that Text::Hyphen and PDF::Builder are installed. You can easily
substitute PDF::API2 and change the PDF::Builder references in the code. You
can change many settings, such as the font, font size, indentation amount,
leading, line length (in Points), and whether output is flush right or ragged
right. The output file is Flatland.pdf
.
There are more examples, including KP.pl
and Triangle.pl
, both giving some
usage examples to get various effects, for a variety of input texts. Both PDF
and text file outputs are produced.