Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handle non-ASCII characters in titles #69

Merged
merged 1 commit into from
Aug 16, 2017

Conversation

calleluks
Copy link
Contributor

Hey, thanks for a great gem!

I tried using an ö in the title of an inlined svg and noticed it got stripped from the output.

I think this is happening because to_html outputs HTML entity codes for these characters by default but when these are then parsed with Nokogiri::XML::Document.parse they get stripped out since they aren't supported in XML:

irb(main):006:0> Nokogiri::XML::Document.parse("<svg>ö</svg>\n")
=> #<Nokogiri::XML::Document:0x3ff16b585394 name="document" children=[#<Nokogiri::XML::Element:0x3ff16b585060 name="svg" children=[#<Nokogiri::XML::Text:0x3ff16b584e80 "ö">]>]>
irb(main):007:0> Nokogiri::XML::Document.parse("<svg>ö</svg>\n").to_html
=> "<svg>&ouml;</svg>\n"
irb(main):008:0> Nokogiri::XML::Document.parse("<svg>&ouml;</svg>\n")
=> #<Nokogiri::XML::Document:0x3ff16af2862c name="document" children=[#<Nokogiri::XML::Element:0x3ff16af282f8 name="svg">]>

Explicitly setting the encoding to "UTF-8" when calling to_html disables generating the entities:

irb(main):009:0> Nokogiri::XML::Document.parse("<svg>ö</svg>\n").to_html(encoding: "UTF-8")
=> "<svg>ö</svg>\n"

Setting the encoding to "UTF-8" when parsing the document makes sure it's implicitly set to "UTF-8" wherever to_html is called later.

irb(main):010:0> Nokogiri::XML::Document.parse("<svg>ö</svg>\n", nil, "UTF-8").to_html
=> "<svg>ö</svg>\n"

@jamesmartin
Copy link
Owner

@calleerlandsson thank you for the detailed explanation of the problem and your patch, I really appreciate the contribution.

@jamesmartin jamesmartin merged commit 7818663 into jamesmartin:master Aug 16, 2017
@jamesmartin jamesmartin added this to the v1.2.3 milestone Aug 16, 2017
@jamesmartin
Copy link
Owner

Released in v1.2.3

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants