text_extraction invalid for habibi.pdf #1619

pubpub-zz · 2023-02-08T20:59:02Z

          I've opened https://github.com/py-pdf/sample-files/pull/13 to put `habibi.pdf` in the sample-files repo.  i recommend including a test for it before merging this.

the extracted show the arab characters to be reversed

Originally posted by @dkg in #1126 (comment)

The text was updated successfully, but these errors were encountered:

pubpub-zz · 2023-02-08T21:01:44Z

@dkg
thanks for the sample :)

fixes py-pdf#1619

Fixes #1619

MartinThoma · 2023-02-10T06:57:40Z

Thank you for the improvement @pubpub-zz 🙏

I'll make a release this weekend :-)

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Feb 8, 2023

BUG : Text extaction not working with one gliph to char sequence

fe40441

fixes py-pdf#1619

pubpub-zz mentioned this issue Feb 8, 2023

BUG: Text extraction not working with one glyph to char sequence #1620

Merged

pubpub-zz changed the title ~~I've opened https://github.com/py-pdf/sample-files/pull/13 to put habibi.pdf in the sample-files repo. i recommend including a test for it before merging this.~~ text_extraction invalid for habibi.pdf Feb 8, 2023

pubpub-zz added is-bug From a users perspective, this is a bug - a violation of the expected behavior with a compliant PDF soon PRs that are almost ready to be merged, issues that get solved pretty soon labels Feb 8, 2023

MartinThoma closed this as completed in #1620 Feb 10, 2023

MartinThoma pushed a commit that referenced this issue Feb 10, 2023

BUG: Text extraction not working with one glyph to char sequence (#1620)

f5ac79b

Fixes #1619

MartinThoma removed the soon PRs that are almost ready to be merged, issues that get solved pretty soon label Feb 26, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

text_extraction invalid for habibi.pdf #1619

text_extraction invalid for habibi.pdf #1619

pubpub-zz commented Feb 8, 2023

pubpub-zz commented Feb 8, 2023

MartinThoma commented Feb 10, 2023

text_extraction invalid for habibi.pdf #1619

text_extraction invalid for habibi.pdf #1619

Comments

pubpub-zz commented Feb 8, 2023

pubpub-zz commented Feb 8, 2023

MartinThoma commented Feb 10, 2023