Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Cannot display U+xxxxx utf-32 symbols #2965

Open
shopping0421 opened this issue Nov 19, 2024 · 1 comment
Open

Cannot display U+xxxxx utf-32 symbols #2965

shopping0421 opened this issue Nov 19, 2024 · 1 comment

Comments

@shopping0421
Copy link

shopping0421 commented Nov 19, 2024

Describe the bug
I want to display all Chinese text as well.
I found out that normal Chinese text display well.
But some hard text that encoded with 4 bytes can not get a good display and overlayed by other word.
e.g.
https://www.compart.com/en/unicode/U+24256
U+24256 "𤉖" display 'PV' in PDF.

To Reproduce
Steps to reproduce the behavior including code snippet (if applies):

  1. register font with:
    https://fonts.google.com/specimen/Cactus+Classical+Serif

which support the text '𤉖'
https://fonts.google.com/specimen/Chocolate+Classical+Sans?preview.text=%F0%A4%89%96

  1. show pdf with text '𤉖'

You can make use of react-pdf REPL to share the snippet

Expected behavior
I should see '𤉖' display correct in the PDF.

Screenshots
If applicable, add screenshots to help explain your problem.

Desktop (please complete the following information):

  • OS: Mac OS
  • Browser Chrome
  • React-pdf version 4.1.4
@shopping0421 shopping0421 changed the title Cannot display U+xxxxx unicode-32 symbols Cannot display U+xxxxx utf-32 symbols Nov 19, 2024
@shopping0421
Copy link
Author

Hi there,
I finally fix this issue by the attached patches. And i hope someone can review my patch and release the fix.
@react-pdf+pdfkit+4.0.0.patch
@react-pdf+layout+4.1.2.patch

And I think the main reason for the issue maybe:
(suppose c is a utf32(U+010000 - U+10FFFF) char)
c is composed by 2 bytes.
c.length() => 2
c.codePointAt() => only get the c[0].codePointAt() which has no mapped codepoint from fonts.

So, what i do is to:

  1. correct the codepoint calculation for utf32 char.
  2. fix the layout library to fix the font suggestion(before it's always restore to default font since not a valid codepoint).
  3. fix the pdfkit to compute with correct glyphs and encoded text for utf32 char.

And finally the utf32 chars display normal in my pdf:
image

reference:
https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant