Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Process bidirectional text in conformance to Unicode standards #1096

Merged
merged 15 commits into from
Feb 9, 2024

Conversation

andersonhc
Copy link
Collaborator

@andersonhc andersonhc commented Jan 19, 2024

This PR implements Implements the Unicode BiDi algorithm to handle bidirectional texts correctly.

Changes on line break
Currently fpdf2 is keeping a separate width control on CurrentLine, adding the width of each character.
This is causing inconsistent breaks when text shaping is used (width can be different than the sum of each character's width).
Now CurrentLine is going to use calculated width of each fragment.
There are opportunities for simplification and optimisation of the line break algorithm, I will try to tackle it in a future PR.

Changes on fallback font
Currently fpdf2 is creating a new fragment for each character taken from a fallback font.
As text shaping is executed per fragment, it wasn't producing correct output, like adjusting an accent on a character since they were on different fragments.
Now, a sequence of characters taken from the same fallback font will produce a single Fragment

New file bidi.py
The main class is BidiParagraph, that takes an input string and execute the Unicode bidirectional algorithm.
The method BidiParagraph.get_bidi_fragments() will return the calculated segments along with their direction ("L" or "R").

BiDi conformance test
Unicode offers 2 files to test bidi algorithm conformance (https://www.unicode.org/reports/tr41/tr41-32.html#Tests9)
BidiTest.txt (7.59MB) has 770,241 tests
BidiCharacterTest.txt (6.65MB) has 91,707 tests
I am still unsure about adding files this big to the project. The tests also take a long time to complete and I am not sure about the cost/benefit of running them on our automated tests.

Checklist:

  • The GitHub pipeline is OK (green),
    meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.

  • A unit test is covering the code added / modified by this PR

  • This PR is ready to be merged

  • In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder

  • A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

@codecov-commenter
Copy link

codecov-commenter commented Jan 19, 2024

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (1effbb0) 93.32% compared to head (92496cf) 93.43%.

❗ Current head 92496cf differs from pull request most recent head 77bb5dc. Consider uploading reports for the commit 77bb5dc to get more accurate results

Files Patch % Lines
fpdf/bidi.py 95.93% 8 Missing and 7 partials ⚠️
fpdf/fpdf.py 94.11% 2 Missing and 3 partials ⚠️
fpdf/fonts.py 80.00% 0 Missing and 1 partial ⚠️
fpdf/line_break.py 97.36% 1 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##           master    #1096      +/-   ##
==========================================
+ Coverage   93.32%   93.43%   +0.11%     
==========================================
  Files          29       30       +1     
  Lines        8701     9111     +410     
  Branches     1929     2067     +138     
==========================================
+ Hits         8120     8513     +393     
- Misses        362      370       +8     
- Partials      219      228       +9     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

@andersonhc
Copy link
Collaborator Author

@Lucas-C do you have any input on this PR?
After this we have enough changes for a new release.

@andrewlutz
Copy link

Will this fix #901 ?

@andersonhc
Copy link
Collaborator Author

Will this fix #901 ?

Yes, This PR fixes it.
To you have any example you want to test with?

@andersonhc andersonhc merged commit 3718231 into py-pdf:master Feb 9, 2024
11 checks passed
@andersonhc andersonhc deleted the bidirectional branch February 9, 2024 23:09
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants