Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

HTML entities replaced by reserved characters in output HTML #139

Closed
blueglyph opened this issue Apr 11, 2023 · 6 comments
Closed

HTML entities replaced by reserved characters in output HTML #139

blueglyph opened this issue Apr 11, 2023 · 6 comments
Labels
enhancement New feature or request

Comments

@blueglyph
Copy link

Problem

HTML entities like &lt; and &gt; are replaced by their equivalent reserved characters (< and >), producing a corrupted output HTML file.

Steps

Launching the executable with the parameters -o index_minified.html index.html, with the index.html content below:

<!DOCTYPE html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="viewport" content="width=device-width,initial-scale=1,maximum-scale=1">
    <title>Test</title>
</head>
<body>
    
<p>
    Pages: <a class="disabled" href=''>&lt;</a>
    <a class="disabled" href='http://192.168.1.38:1111'>1</a>
    <a href='http://192.168.1.38:1111/page/2'>&gt;</a>
</p>

    <footer>
        <span class="copyright">Copyright 2023 by AUTHOR.</span><br>
        <span class="zola">Made with <a href="https://www.getzola.org/">Zola</a></span>
    </footer>
</body>
</html>

produces the following:

<!doctypehtml><html lang=en><meta charset=UTF-8><meta content=width=device-width,initial-scale=1,maximum-scale=1 name=viewport><title>Test</title><body><p>Pages: <a class=disabled href><</a> <a class=disabled href=http://192.168.1.38:1111>1</a> <a href=http://192.168.1.38:1111/page/2>></a><footer><span class=copyright>Copyright 2023 by AUTHOR.</span><br><span class=zola>Made with <a href=https://www.getzola.org/>Zola</a></span></footer>

in which you can see those two errors: <a class=disabled href><</a> and <a href=http://192.168.1.38:1111/page/2>></a> corresponding to lines 11 and 13 of the original file (as attachment: index.zip)

I haven't tested other HTML entities.

Environment

CLI version 0.10.8 (Windows 10 x64), or with Zola as Rust crate version 0.10.8.

@wilsonzlin
Copy link
Owner

This is intentional, as mentioned in the README. It will still be parsed correctly by the browser, which allows some extra compression. This is only done when safe, so there shouldn't be any misinterpretation.

@blueglyph
Copy link
Author

I missed that part, "will not pass validation". Isn't that a gamble on how the different engines parse HTML (and whether they'll continue to do so)?

I can't reproduce the corrupted pages I got, so maybe I had another issue, but I'm definitely not comfortable with it. I have tried with the --ensure-spec-compliant-unquoted-attribute-values option (in the latest Windows binary) but it didn't make any difference, is there any other way to avoid it?

@wilsonzlin
Copy link
Owner

The behaviour is defined as part of the HTML WHATWG spec as noted in the 13.2.5.6 Tag open state rules, so browsers and any other spec-compliant parser handle it correctly, and it's not undefined behaviour. Browsers have well-defined behaviour for consistency and compatibility reasons, including for handling not well formed HTML. I may take a look into adding an option to disable this specific minification.

@blueglyph
Copy link
Author

That's interesting, thanks!

If it's safe, no need for an additional option. It's easy enough to filter out those "issues" in the validator logs. 🙂

@rtasarz
Copy link

rtasarz commented Nov 14, 2023

It still may cause issues for projects requiring clean validation results to pass external audits. I maintain some where it is client's requirement. Telling the truth I'll rather switch off minification than fight for an exception (been there, done that, given up as it was not worth my time).

@wilsonzlin wilsonzlin added the enhancement New feature or request label Dec 24, 2023
@wilsonzlin
Copy link
Owner

In version 0.16.0 (soon to be released), these entity minifications will no longer be done by default but can be enabled by allow_optimal_entities. If you test it out let me know if it works for you.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

3 participants