-
Notifications
You must be signed in to change notification settings - Fork 269
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Process bidirectional text in conformance to Unicode standards #1096
Conversation
Codecov ReportAttention:
Additional details and impacted files@@ Coverage Diff @@
## master #1096 +/- ##
==========================================
+ Coverage 93.32% 93.43% +0.11%
==========================================
Files 29 30 +1
Lines 8701 9111 +410
Branches 1929 2067 +138
==========================================
+ Hits 8120 8513 +393
- Misses 362 370 +8
- Partials 219 228 +9 ☔ View full report in Codecov by Sentry. |
…nto bidirectional
@Lucas-C do you have any input on this PR? |
Will this fix #901 ? |
Yes, This PR fixes it. |
This PR implements Implements the Unicode BiDi algorithm to handle bidirectional texts correctly.
Changes on line break
Currently fpdf2 is keeping a separate
width
control on CurrentLine, adding the width of each character.This is causing inconsistent breaks when text shaping is used (width can be different than the sum of each character's width).
Now CurrentLine is going to use calculated width of each fragment.
There are opportunities for simplification and optimisation of the line break algorithm, I will try to tackle it in a future PR.
Changes on fallback font
Currently fpdf2 is creating a new fragment for each character taken from a fallback font.
As text shaping is executed per fragment, it wasn't producing correct output, like adjusting an accent on a character since they were on different fragments.
Now, a sequence of characters taken from the same fallback font will produce a single Fragment
New file bidi.py
The main class is BidiParagraph, that takes an input string and execute the Unicode bidirectional algorithm.
The method
BidiParagraph.get_bidi_fragments()
will return the calculated segments along with their direction ("L" or "R").BiDi conformance test
Unicode offers 2 files to test bidi algorithm conformance (https://www.unicode.org/reports/tr41/tr41-32.html#Tests9)
BidiTest.txt (7.59MB) has 770,241 tests
BidiCharacterTest.txt (6.65MB) has 91,707 tests
I am still unsure about adding files this big to the project. The tests also take a long time to complete and I am not sure about the cost/benefit of running them on our automated tests.
Checklist:
The GitHub pipeline is OK (green),
meaning that both
pylint
(static code analyzer) andblack
(code formatter) are happy with the changes of this PR.A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the
docs/
folderA mention of the change is present in
CHANGELOG.md
By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.