You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
{{ message }}
This repository has been archived by the owner on Jan 29, 2020. It is now read-only.
This is, because all that utf8_decode() does is convert a string encoded in UTF-8 to ISO-8859-1. This is of course not good because UTF-8 can represent many more characters than ISO-8859-1. See this comment at PHP Man.
The real problem is, that DOMDocument::loadHTML () by default will always treat the source-string as ISO-8859-1-encoded. Unfortunately, you can only change this behavior by specifying the encoding in the html head at the beginning of the source-string. This comment at PHP Man still seems to apply even though it is 10 years old and UTF-8 is so common nowadays!
So, based on this comment I again extended Zend\Dom\Query as follows:
Should a solution be implemented in Zend\Dom\Query?
Comment
User: @croensch
Created On: 2015-08-28T14:15:05Z
Updated At: 2015-08-28T14:15:05Z
Body
AFAIK if no header is present the passed encoding is used, if the header is present the passed encoding is ignored. So if your documents are always in iso-8859-1 then just try setDocument() as it is?
The text was updated successfully, but these errors were encountered:
This issue has been moved from the
zendframework
repository as part of the bug migration program as outlined here - http://framework.zend.com/blog/2016-04-11-issue-closures.htmlOriginal Issue: https://api.github.com/repos/zendframework/zendframework/issues/7618
User: @mtrippodi
Created On: 2015-08-26T13:51:12Z
Updated At: 2015-11-06T22:17:32Z
Body
...will result in sth. like:
... will solve the problem and result in correct rendering.
For convenience I extended
Zend\Dom\Query
:Now I wonder if this could be perhaps implemented in
Zend\Dom\Query
. Or do I miss something and there's a better solution?Thanks
m.
Comment
User: @mtrippodi
Created On: 2015-08-26T18:15:20Z
Updated At: 2015-08-26T19:17:05Z
Body
OK, forget my first "solution". It's bad because e.g. ...
...will result in:
This is, because all that
utf8_decode()
does is convert a string encoded in UTF-8 to ISO-8859-1. This is of course not good because UTF-8 can represent many more characters than ISO-8859-1. See this comment at PHP Man.The real problem is, that
DOMDocument::loadHTML ()
by default will always treat the source-string as ISO-8859-1-encoded. Unfortunately, you can only change this behavior by specifying the encoding in the html head at the beginning of the source-string. This comment at PHP Man still seems to apply even though it is 10 years old and UTF-8 is so common nowadays!So, based on this comment I again extended
Zend\Dom\Query
as follows:Still, two questions remain:
Zend\Dom\Query
?Comment
User: @croensch
Created On: 2015-08-28T14:15:05Z
Updated At: 2015-08-28T14:15:05Z
Body
AFAIK if no header is present the passed encoding is used, if the header is present the passed encoding is ignored. So if your documents are always in iso-8859-1 then just try
setDocument()
as it is?The text was updated successfully, but these errors were encountered: