-
Notifications
You must be signed in to change notification settings - Fork 38
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
HTML entities replaced by reserved characters in output HTML #139
Comments
This is intentional, as mentioned in the README. It will still be parsed correctly by the browser, which allows some extra compression. This is only done when safe, so there shouldn't be any misinterpretation. |
I missed that part, "will not pass validation". Isn't that a gamble on how the different engines parse HTML (and whether they'll continue to do so)? I can't reproduce the corrupted pages I got, so maybe I had another issue, but I'm definitely not comfortable with it. I have tried with the |
The behaviour is defined as part of the HTML WHATWG spec as noted in the 13.2.5.6 Tag open state rules, so browsers and any other spec-compliant parser handle it correctly, and it's not undefined behaviour. Browsers have well-defined behaviour for consistency and compatibility reasons, including for handling not well formed HTML. I may take a look into adding an option to disable this specific minification. |
That's interesting, thanks! If it's safe, no need for an additional option. It's easy enough to filter out those "issues" in the validator logs. 🙂 |
It still may cause issues for projects requiring clean validation results to pass external audits. I maintain some where it is client's requirement. Telling the truth I'll rather switch off minification than fight for an exception (been there, done that, given up as it was not worth my time). |
In version |
Problem
HTML entities like
<
and>
are replaced by their equivalent reserved characters (<
and>
), producing a corrupted output HTML file.Steps
Launching the executable with the parameters
-o index_minified.html index.html
, with the index.html content below:produces the following:
in which you can see those two errors:
<a class=disabled href><</a>
and<a href=http://192.168.1.38:1111/page/2>></a>
corresponding to lines 11 and 13 of the original file (as attachment: index.zip)I haven't tested other HTML entities.
Environment
CLI version 0.10.8 (Windows 10 x64), or with Zola as Rust crate version 0.10.8.
The text was updated successfully, but these errors were encountered: