-
Notifications
You must be signed in to change notification settings - Fork 98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Hebrew text displayed backwards #97
Comments
Find attached a PDF file, created in LibreOffice Writer, with the following structure:
I've then highlighted the text Here is the output:
Note that
|
Thanks for the report! In this case the headings are extracted correctly because they come as a string from the PDF metadata. The problem is that pdfminer's text extraction routines don't support right-to-left text: pdfminer/pdfminer.six#515 There are also some similar assumptions in pdfannots that affect things like the relative order that two annotations are reported when they appear on the same line of text. I could probably fix that but the bigger issue is the one linked above. |
Thank you. That bug report points to a fork, PdfMiner.RTL which has experimental RTL support: I tried it and in general the tool works well. |
This tool is terrific, thank you.
Highlighted and underlined Hebrew text are displayed backwards. Interestingly, the title blurb preceding the highlighted text is not backwards.
The text was updated successfully, but these errors were encountered: