Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting Advanced encoding /UniGB-UTF16-H not implemented error for chinese text #2812

Closed
Aswin4070 opened this issue Aug 24, 2024 · 5 comments · Fixed by #2819
Closed

Getting Advanced encoding /UniGB-UTF16-H not implemented error for chinese text #2812

Aswin4070 opened this issue Aug 24, 2024 · 5 comments · Fixed by #2819

Comments

@Aswin4070
Copy link

Aswin4070 commented Aug 24, 2024

Replace this: What happened? What were you trying to achieve?

Environment

Which environment were you using when you encountered the problem?

from pypdf import PdfReader
reader = PdfReader("chinese-text.pdf")

Error:

"Advanced encoding /UniGB-UTF16-H not implemented yet" error continuously for all the lines.

When is looked at the code in her for cmap file in pypdf:
https://github.com/py-pdf/pypdf/blob/main/pypdf/_cmap.py

I see the the particular encoding - "/UniGB-UTF16-H" is not declared in "_predefined_cmap" in the code

Can we add just the encoding type under _predefined_cmap"

UniGB-UTF16-H | ISO/IEC 10646 (Unicode), UTF-16 encoding

or should we create something like this:

elif "UniGB-" in enc:
                encoding = "utf-16-be"
@pubpub-zz
Copy link
Collaborator

@Aswin4070 can you share your test document for analysis

@pubpub-zz
Copy link
Collaborator

after looking on internet and a few tests I would have more likely proposed to use gb18030:
make a test adding at the end of _prefdefined_cmap:

    "\UniGB−UTF16−H": "gb18030",
    "\UniGB−UTF16−V": "gb18030",

@pubpub-zz
Copy link
Collaborator

@Aswin4070
any update ? is it possible to have a copy of the document ?

@Aswin4070
Copy link
Author

pubpub-zz added a commit to pubpub-zz/pypdf that referenced this issue Aug 27, 2024
@pubpub-zz
Copy link
Collaborator

Thx
gb18030 was the good choice but I mistype "" instead of "/"
pr is issued

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants