Process bidirectional text in conformance to Unicode standards #1096

andersonhc · 2024-01-19T22:23:04Z

This PR implements Implements the Unicode BiDi algorithm to handle bidirectional texts correctly.

Changes on line break
Currently fpdf2 is keeping a separate width control on CurrentLine, adding the width of each character.
This is causing inconsistent breaks when text shaping is used (width can be different than the sum of each character's width).
Now CurrentLine is going to use calculated width of each fragment.
There are opportunities for simplification and optimisation of the line break algorithm, I will try to tackle it in a future PR.

Changes on fallback font
Currently fpdf2 is creating a new fragment for each character taken from a fallback font.
As text shaping is executed per fragment, it wasn't producing correct output, like adjusting an accent on a character since they were on different fragments.
Now, a sequence of characters taken from the same fallback font will produce a single Fragment

New file bidi.py
The main class is BidiParagraph, that takes an input string and execute the Unicode bidirectional algorithm.
The method BidiParagraph.get_bidi_fragments() will return the calculated segments along with their direction ("L" or "R").

BiDi conformance test
Unicode offers 2 files to test bidi algorithm conformance (https://www.unicode.org/reports/tr41/tr41-32.html#Tests9)
BidiTest.txt (7.59MB) has 770,241 tests
BidiCharacterTest.txt (6.65MB) has 91,707 tests
I am still unsure about adding files this big to the project. The tests also take a long time to complete and I am not sure about the cost/benefit of running them on our automated tests.

Checklist:

The GitHub pipeline is OK (green),
meaning that both pylint (static code analyzer) and black (code formatter) are happy with the changes of this PR.
A unit test is covering the code added / modified by this PR
This PR is ready to be merged
In case of a new feature, docstrings have been added, with also some documentation in the docs/ folder
A mention of the change is present in CHANGELOG.md

By submitting this pull request, I confirm that my contribution is made under the terms of the GNU LGPL 3.0 license.

codecov-commenter · 2024-01-19T22:39:26Z

Codecov Report

Attention: 22 lines in your changes are missing coverage. Please review.

Comparison is base (1effbb0) 93.32% compared to head (92496cf) 93.43%.

❗ Current head 92496cf differs from pull request most recent head 77bb5dc. Consider uploading reports for the commit 77bb5dc to get more accurate results

Files	Patch %	Lines
fpdf/bidi.py	95.93%	8 Missing and 7 partials ⚠️
fpdf/fpdf.py	94.11%	2 Missing and 3 partials ⚠️
fpdf/fonts.py	80.00%	0 Missing and 1 partial ⚠️
fpdf/line_break.py	97.36%	1 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##           master    #1096      +/-   ##
==========================================
+ Coverage   93.32%   93.43%   +0.11%     
==========================================
  Files          29       30       +1     
  Lines        8701     9111     +410     
  Branches     1929     2067     +138     
==========================================
+ Hits         8120     8513     +393     
- Misses        362      370       +8     
- Partials      219      228       +9

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

…nto bidirectional

andersonhc · 2024-02-02T18:00:16Z

@Lucas-C do you have any input on this PR?
After this we have enough changes for a new release.

andrewlutz · 2024-02-07T20:11:34Z

Will this fix #901 ?

andersonhc · 2024-02-07T20:21:22Z

Will this fix #901 ?

Yes, This PR fixes it.
To you have any example you want to test with?

andersonhc added 5 commits January 3, 2024 07:44

initial bidi implementation

1710e49

fix several issues

b5a14ce

fix pylint alarms

0ce7e04

fix black formatting

1467142

line break, font fallback and tests

8e59638

andersonhc requested a review from Lucas-C January 19, 2024 22:23

andersonhc requested a review from gmischler as a code owner January 19, 2024 22:23

andersonhc mentioned this pull request Jan 19, 2024

(draft) Process bidirectional text in conformance to Unicode standards #972

Closed

5 tasks

andersonhc added 8 commits January 19, 2024 18:34

update doc

3420e7e

fix missing import

adec4e7

change from local to remote conformance files

6af425b

Merge branch 'master' into bidirectional

4f720c2

black

db8cec8

Merge branch 'bidirectional' of https://github.com/andersonhc/fpdf2 i…

8d2f0bf

…nto bidirectional

black

7cf25da

remove traces + performance

76ef81e

andersonhc added 2 commits February 2, 2024 14:21

Merge branch 'py-pdf:master' into bidirectional

9a14cf3

update changelog

77bb5dc

andersonhc merged commit 3718231 into py-pdf:master Feb 9, 2024
11 checks passed

andersonhc deleted the bidirectional branch February 9, 2024 23:09

This was referenced Feb 10, 2024

RTL multi-line text gets bottom-up #901

Closed

New feature: Handle bidirectional text #882

Closed

This was referenced Feb 24, 2025

Implement line wrap support cropio/prawn-rtl-support#5

Open

Advanced text layout prawnpdf/prawn#1295

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Process bidirectional text in conformance to Unicode standards #1096

Process bidirectional text in conformance to Unicode standards #1096

andersonhc commented Jan 19, 2024 •

edited

Loading

codecov-commenter commented Jan 19, 2024 •

edited

Loading

andersonhc commented Feb 2, 2024

andrewlutz commented Feb 7, 2024

andersonhc commented Feb 7, 2024

Process bidirectional text in conformance to Unicode standards #1096

Process bidirectional text in conformance to Unicode standards #1096

Conversation

andersonhc commented Jan 19, 2024 • edited Loading

codecov-commenter commented Jan 19, 2024 • edited Loading

Codecov Report

andersonhc commented Feb 2, 2024

andrewlutz commented Feb 7, 2024

andersonhc commented Feb 7, 2024

andersonhc commented Jan 19, 2024 •

edited

Loading

codecov-commenter commented Jan 19, 2024 •

edited

Loading