-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Word splitting in non-English languages #2
Comments
See PDF::Builder's |
UniWrap.pm does not appear to be used anywhere in PDF::Builder, and may be quite obsolete (when compared against the classes table in https://unicode.org/reports/tr14/). This UniCode page does mention quite a few cases of how to handle line splitting, and could be a good starting point (such as for updating UniWrap). |
See PhilterPaper/Perl-PDF-Builder#183 for further thoughts on hyphenation for non-English languages (both Latin alphabet and not). |
See Alex Holkner's thesis (https://citeseerx.ist.psu.edu/pdf/ee95750a9dd047b52901efda59819864bb9ede4a) on page 11, for some interesting thoughts on how to represent splitable words, including those with German/Dutch orthography. In any case, you can't simply break the word into syllables -- you need to indicate if there's any "funny business" where the word is split or is put together, which has an effect on counting lengths of fragments. |
I am aware that some other languages, such as Dutch and German, have some specific rules about changing or repeating letters when a word is split. These rules will need to be built into either Text::KnuthPlass itself (which in turn needs to be made aware of the human language being used), or possibly into a code layer involved with paragraph shaping and such. It might even be an extension to Text::Hyphen or other hyphenation code. Currently, you need to invoke the appropriate Text::Hyphen::XX (XX is the language code) to get the right place to split a word, but I don't think it goes beyond that.
The text was updated successfully, but these errors were encountered: