Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enhancements #4

Open
PhilterPaper opened this issue Dec 13, 2020 · 8 comments
Open

Enhancements #4

PhilterPaper opened this issue Dec 13, 2020 · 8 comments
Labels
enhancement New feature or request help wanted Extra attention is needed

Comments

@PhilterPaper
Copy link
Owner

PhilterPaper commented Dec 13, 2020

See also [/forum/general-12/textknuthplass-general-discussion/] for general discussions of direction of Text::KnuthPlass package.

Forum is closed down. Roadmap and discussions here at GitHub.

@PhilterPaper PhilterPaper added the enhancement New feature or request label Dec 13, 2020
@PhilterPaper
Copy link
Owner Author

Per bramstein/typeset#30, it would be very good, if an otherwise unbreakable word (or fragment) is longer than the entire available line, to break it at some arbitrary point so that it does fit. First try condensing the text with charspace() and/or hscale(), by a reasonable amount. If that isn't enough to fit it, try breaking at "reasonable" points, such as between a lowercase and an uppercase letter (camelCase text) or a letter and a number, or next to some punctuation. Finally, if all else fails, break where it would just fit into the line. This might be improved by looking at what happens to the remainder of the text, if splitting a bit earlier (to the left) results in an overall better appearance.

Note that this might not fit into the standard Knuth-Plass boxes & glue structure, as it would have to be done on an ad hoc basis, as a word or fragment larger than the current line is encountered. Remember than line lengths may vary, so it's difficult to presplit the word. Also, there is the issue of unsplittable words or fragments that are not longer than the line, but moving them to the next line will result in unacceptably large stretching to fit the current line. It might be better to go ahead and force a split (with appropriate penalty) than to accept such wide spaces between words (very, very loose line)!

@PhilterPaper
Copy link
Owner Author

It might be cleanest to take any "word" over 8 or so characters, and force a split in (or near) Text::Hyphen, should that module be otherwise unable to split up the word. Split first at hyphens and dashes, then at camelCase, then between letters and digits, maybe then around punctuation, and finally (if any chunks longer than 8 characters are left) every 3 or 4 characters. Just make sure at least 2 are left in the leftmost, and 3 in the rightmost. This would then fit in with the standard Knuth-Plass box and glue structure.

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Feb 6, 2021

  1. I would like to be able to specify a line number within a paragraph that I expect will be the last line in this column (paragraph will be split in the middle), and therefore should have a very high penalty for hyphenation. The line should still be justified, but hyphenated (split) words should be avoided, as this is considered bad typesetting.
  2. Text::KnuthPlass needs a way to account for changes in typeface (affecting widths), font size, and any variants that affect width, so that it can work with paragraphs containing something other than monotone text. This might involve a wrapper around KP that contains information on the font, size, etc. for any chunk of text, and can somehow feed it into the algorithm that determines the width of that chunk (usually a single word?). Don't forget that with any change in font or font size, the width of a space will also change.
  3. In a manner compatible with item 2, the language used for words might change (e.g., a foreign word embedded in text), requiring a different hyphenation call (one that is language-sensitive). Along with font face, size, variant, etc., the language might be embedded in the input text. It may require a new version of Text::Hyphen to support multiple languages at once, in some manner, allowing on-the-fly changes to which dictionary to use when hyphenating a word.
  4. Mixed bidirectional (RTL) support would be good, in case a line ends up partly RTL (e.g., Hebrew) and then has some LTR text (e.g., German). A big problem is where do you hyphenate a long word in LTR if it's to the left of some RTL text (i.e., word break is not at the right margin)?
  5. In general, compatibility with HarfBuzz::Shaper for anything other than simple script text. It might use the same wrapper around KP as in 2, 3, and 4. However, we need to think about the order of KP and HS invocation, as pertains to the glyph list returned by HS, including ligatures (which may end up being split by KP).

See #2 and #3 for some more details.

@PhilterPaper PhilterPaper added the help wanted Extra attention is needed label Feb 6, 2021
@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Feb 7, 2021

Knuth-Plass can request that spaces (glue) be stretched or shrunk to tune the length of the line (justification). For extreme stretching and/or shrinking, it might look better if intercharacter spacing could be adjusted (charspace() method in PDF::Builder) to put some of the line stretch within words, rather than only between them. With splitting of "unsplittable" words enabled, it might never get to this point, but it's worth considering. This might be part of a wrapper around KP itself to move excessive glue stretching to be partly taken up by charspace().

For extreme stretching or condensing, PDF::Builder has the hscale() method to warp individual characters in a line, adding a little more width change, at the cost of characters looking odd compared to other lines. Worth keeping in mind, though.

Add: Another thought -- for PDF::Builder usage (not necessarily in general), if KP gives the same glue (spacing) size between all words in a line (with the same stretch or shrink), the wordspace() call might be used to adjust all interword spaces in one go, eliminating the need to position and output each word separately. It may be possible to set an entire line in one PDF command, resulting in faster running and a smaller PDF file size.

Add: If using charspace() in PDF to move some of the glue stretch into the words (boxes) themselves, watch out for kerning. You would want to stretch any reduced distance between letters by the same ratio as the rest of the word. This could be a problem if kerning is handled by starting the next word fragment at a fixed place, which would have to be moved to accommodate the stretch. I'll have to check how PDF's 'TJ' operator (used for output with kerning) interacts with charspace().

Add: Be careful with some languages and typographic customs, as s p a c e d l e t t e r s may be used for emphasis. If an entire line is charspace'd that may not be too bad, but still it could accidentally "hide" emphasis (deliberate extra spacing), or make it appear that an entire line is emphasized.

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Mar 6, 2021

Per bramstein/typeset#27, and #5, it sounds like it would be useful to at least be able to change some parameter values at new(), such as demerits for consecutive hyphenated lines. Somewhere that's on my list, referring to documentation that says (to the effect of), "if you know your way around Knuth Plass, there are plenty more settings that could be changed."

Add: Quite a few parameters already were modifiable, but undocumented. I have added them to the POD. Maybe more are lurking out there somewhere!

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Mar 27, 2021

In line with number (1) above, and specification of line lengths in general, keep in mind that a page-fitting system might want to slightly increase or decrease a page's leading in order to deal with widows and orphans. This might have an undesirable effect on line lengths intended to fit around images and other inserts, as the specific lines indented might now begin and end a little early or late. To fit a single line, this might not be too visible, and no special treatment might be needed, but don't forget it. It's possible that once in a while some manual intervention might be needed. Or, feed back notice that the leading was changed, and adjust all image and insert y-locations? While on the subject, what if (1)'s break point ends up producing a widow? Since the penultimate line should not be hyphenated anyway, that might not be too bad.

A wrap-around for KP that handles these things should probably accept image/insert dimensions in points anyway, so that it can float them up and down as leading changes, and set line lengths on-the-fly. We just want to avoid having to go back and rerun KP on a page multiple times to fine tune placement.

Paragraph indentation/outdents should be handled by the wrapper adding fixed-length glue to the front of the affected line(s), so that from the programming POV, all lines are the same length and start at the same 'x'. Don't forget to allow for styles where the first paragraph after a heading is unindented. Allow for indents specified in absolute amounts (e.g., mm) or relative amounts (e.g., ems or % of column width). Done in 1.07 release.

@PhilterPaper
Copy link
Owner Author

PhilterPaper commented Apr 18, 2021

Rather than giving a set of line lengths to use (or in addition to...), give a vertical length (and width) of indentation and a starting offset from the top of the paragraph. Return any unused indentation for use by the next paragraph (the wrapper would handle this).

How to extend this to non-rectangular situations? This would include the various examples of triangular, circular, etc. paragraphs, and non-rectangular cutouts for images and asides. A possibility would be to define a path (lines, arcs, splines) bounding the paragraph, with some provision to allow an indefinite continuation on the bottom. For PDF::Builder (PDF generation), I want to do something like this anyway for arbitrary paragraph shapes, so maybe we can kill two birds with one stone.

Any "path" defining a non-rectangular overall paragraph is going to have the problem of what to do when leading gets tweaked to avoid a widow or orphan... which will change the shape of all paragraphs on the page! This, in turn, could even change the line breaking to create or eliminate widows and orphans.

@PhilterPaper
Copy link
Owner Author

I have seen suggestions that hyphenation use soft hyphens (SHY) rather than - for the output. This way, screen readers and reflow/reformatters (editors) can realize that the hyphen isn't real, and can be omitted when gluing the word back together. Of course, there's still the issue of Dutch/German (etc.) repeating or adding letters when splitting words, which would need to be corrected when reforming the split words! See #2.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant