-
Notifications
You must be signed in to change notification settings - Fork 1.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Getting Advanced encoding /UniGB-UTF16-H not implemented error for chinese text #2812
Comments
@Aswin4070 can you share your test document for analysis |
after looking on internet and a few tests I would have more likely proposed to use gb18030:
|
@Aswin4070 |
Hi @pubpub-zz , Sorry for the delay. You can the pdf from here: |
Thx |
Replace this: What happened? What were you trying to achieve?
Environment
Which environment were you using when you encountered the problem?
from pypdf import PdfReader reader = PdfReader("chinese-text.pdf")
Error:
When is looked at the code in her for cmap file in pypdf:
https://github.com/py-pdf/pypdf/blob/main/pypdf/_cmap.py
I see the the particular encoding - "/UniGB-UTF16-H" is not declared in "_predefined_cmap" in the code
Can we add just the encoding type under _predefined_cmap"
or should we create something like this:
The text was updated successfully, but these errors were encountered: