Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

error with PDFTextExtractor #529

Closed
Wugengxian opened this issue Apr 26, 2021 · 1 comment
Closed

error with PDFTextExtractor #529

Wugengxian opened this issue Apr 26, 2021 · 1 comment
Labels
Milestone

Comments

@Wugengxian
Copy link

Wugengxian commented Apr 26, 2021

Describe the bug
/toUnicode is error, then PDFTextExtractor will make mistake.

To Reproduce
this is code:

void testToUnicode() throws Exception {
        Document document = new Document();
        Document.compress = false;
        FileOutputStream outputStream = new FileOutputStream("output.pdf");
        PdfWriter.getInstance(document, outputStream);
        document.open();

        document.add(new Chunk("ετε", new Font(Font.SYMBOL)));
        document.close();
        PdfTextExtractor pdfTextExtractor = new PdfTextExtractor(new PdfReader("output.pdf"));
        Assertions.assertEquals("ετε", pdfTextExtractor.getTextFromPage(1));
    }

Expected behavior
when we copy "ετε" in html or use PdfTextExtractor, it show "ͧͶͧ". which is error
Expected behavior
when we copy "ετε" in html or use PdfTextExtractor, it should show "ετε".

Screenshots
image

System (please complete the following information):

  • OS: Windows 10
  • Used Font:

Additional context
I have fixed it, the error happen in /ToUnicode.
error /ToUnicode:
image

@Wugengxian
Copy link
Author

it also has other problem when we use font.Symbol in Chunk. I will provide a pull request in the future.

@asturio asturio added this to the 1.3.26 milestone May 2, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

2 participants