Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unicode is broken #293

Closed
wisedier opened this issue Jan 18, 2016 · 7 comments
Closed

Unicode is broken #293

wisedier opened this issue Jan 18, 2016 · 7 comments

Comments

@wisedier
Copy link

I tried to convert html to pdf using below python code.

def html2pdf(html):
    """
    :param html: rendered jinja2 html text
    :type html: unicode
    :return: PDF bytes
    """

    html = HTML(string=html, encoding="utf-8")
    with BytesIO() as pdf_buffer:
        html.write_pdf(pdf_buffer)
        pdf_buffer.seek(0)
        pdf = pdf_buffer.getvalue()

    return pdf

Below html code is passed html into html2pdf function

<span class="warning">본 쿠폰의 무단사용 및 불법복제 시, 법적인 제제를 받을 수 있습니다.</span>
<span class="warning">또한 한 번 발급된 쿠폰은 재발행이 불가하며 사용된 쿠폰은 재발행 및 삭제가 불가능합니다.</span>

Below image is a screenshot shows unicode is broken.
2016-01-18 7 48 26

I tested under Python 2.7.11 in Ubuntu 14.04.3 LTS (3.13.0-48-generic #80-Ubuntu SMP Thu Mar 12 11:16:15 UTC 2015 x86_64 x86_64 x86_64 GNU/Linux) and upgraded packages to the latest version. However, in OS X, it properly works. I think this problem depends on some of required packages.

@liZe
Copy link
Member

liZe commented Jan 18, 2016

This bug report is funny on my browser:
capture d ecran de 2016-01-18 13-13-52

More seriously, the problem is not Unicode, it's the font you use to render your document. On OS X, the font used (probably the default one, if you didn't set one in the stylesheet) has the requested characters, but on Ubuntu the font used to render the document is a fallback font (as it is in my browser).

You can set the font you want to use in the stylesheet, and your problem will be fixed.

@liZe liZe closed this as completed Jan 18, 2016
@SimonSapin
Copy link
Member

To be fair there is a bug here. When a font doesn’t have a requested character, per the CSS spec the rendering engine should try harder than we do to find some font that has it, and only use a fallback font as a last resort.

However font selection is in the realm of Pango, so we likely can’t fix this short of rewriting all of WeasyPrint’s text and font handling to not use Pango. (Which unfortunately isn’t likely to happen any time soon.)

@plahoti
Copy link

plahoti commented Dec 26, 2016

@liZe: I added a font family in my stylesheet explicitly, and still see a similar issue.

Platform: Centos release 6.8 (Final)
Software/Package versions:
Weasyprint 0.34
Pango 1.40.3 on top of cairo 1.14.8 (freetype 2.7, fontconfig 2.12.1), glib 2.38.2 and harfbuzz 1.3.4

@liZe
Copy link
Member

liZe commented Dec 26, 2016

@plahoti Could you please provide a short sample of HTML+CSS that doesn't work for you?

@plahoti
Copy link

plahoti commented Dec 26, 2016

@liZe: As it turns out, I did not have the specified font family installed in my environment. Post doing that, it worked like a charm. Apologies and thanks for your concern.

@khaledhosny
Copy link

Pango should do font fallback just fine, it might be that there was no fonts with Korean support installed on the system. Either that or something is seriously broken.

@liZe
Copy link
Member

liZe commented Dec 26, 2016

Here's the only reliable source I've found about the links between FontConfig (used by Pango at least on UNIX-like platforms) and CSS2 font name fallback. I can't find any other information about this.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

5 participants