-
Notifications
You must be signed in to change notification settings - Fork 384
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Support WordPress installs with non-UTF-8 charsets #855
Comments
I tried to fix this as part of #888 but I failed. My abandoned patch with lots of debug code: https://gist.github.com/westonruter/04d479e809409e1f12a5944701f6f24f The problem came down to <?php
$opening_quote = '“';
$closing_quote = '”';
$output = html_entity_decode( $opening_quote . 'Hello World' . $closing_quote, ENT_QUOTES, 'UTF-8' );
$output = mb_convert_encoding( $output, 'ISO-8859-1', 'UTF-8' );
echo $output; Output is:
The left and right double quotes are both getting converted into I wasn't able to prevent |
I think this should be fixed now via #3758 and the automated conversion happening inside of |
Please add QA instructions, @schlessera |
@csossi I fail to come up with an easy way to test this on the staging server, it would mean changing the default encoding of the server to test it. @westonruter Do you have a suggestion on how we can test this in QA? Or do you think the automated tests in place are enough in this case? |
@schlessera Yeah, I don't think there's a way to test this on staging. The only way to test would be to create a new WordPress install with a non-UTF-8 charset, populate content with non-ASCII characters, and then verify they pass through as UTF-8 without mojibake. So this would need to be developer QA, most likely. |
Moving to "Done" |
Was it QA'd though? I'd like @kienstra perhaps to check this in a dev environment. |
Ah, ok - moving back - @kienstra - can you take a look? |
Sure, I'll test this after the sync |
Testing Hi @westonruter, Sorry for the delay here. It looks like a site with a charset of Still, I don't know how meaningful this is, or how realistic these steps to reproduce are. Especially step 7 below. There might not be many people who set the This issue doesn't occur if It only occurs if the option was somehow changed to be Steps To Reproduce
|
Thank you very much. This appears to not be working. |
Thanks for the review, @kienstra ! I'll use your steps to recreate on my end and investigate. |
@kienstra You mention in your instruction |
The problem seems to persist on |
The AMP spec requires
<meta charset="utf-8">
and this is naturally problematic for sites that still use a Latin1 charset. We need to do a few things:mb_convert_encoding()
from theget_bloginfo('charset')
toutf-8
when reading from the DB. We'll also need to forcibly addheader( 'Content-Type: text/html; charset=utf-8' )
to override what WordPress is sending by default.\AMP_DOM_Utils::get_dom_from_content()
in how it is currently assuming UTF-8 content for the input.https://github.com/Automattic/amp-wp/blob/e60cb152a50e2ffbf5504d41aee859ad1d0e0baa/includes/class-amp-theme-support.php#L192
The text was updated successfully, but these errors were encountered: