Escaped `<` characters (`<`) are processed incorrectly #191

chrispy-snps · 2024-06-07T12:18:07Z

This is a more specific follow-up to #182.

When the < escape sequence is processed, it is incorrectly converted to &LT instead of kept as-is:

>>> import minify_html
>>> print(minify_html_onepass.minify("&lt;"))
<

>>> print(minify_html_onepass.minify("&lt;faketag"))
&LTfaketag

>>> print(minify_html_onepass.minify("&lt;faketag&gt;"))
&LTfaketag>

Strangely, a bare < by itself is processed correctly. It is only when followed by content that it breaks.

The issue occurs in both minify_html and minify_html_onepass.

We are able to work around it as follows:

html = html.replace("&lt;", "AMP_LT_WORKAROUND")
html_minified = minify_html.minify(html)
html = html.replace("AMP_LT_WORKAROUND", "&lt;")

but a proper fix would be better (and more efficient, as we process tens of thousands of HTML files at a time).

The text was updated successfully, but these errors were encountered:

codingjerk · 2024-07-04T13:03:44Z

Hi @chrispy-snps, thank you for workaround

Rongronggg9 · 2024-08-09T14:53:30Z

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Escaped `<` characters (`<`) are processed incorrectly #191

Escaped `<` characters (`<`) are processed incorrectly #191

chrispy-snps commented Jun 7, 2024 •

edited

Loading

codingjerk commented Jul 4, 2024

Rongronggg9 commented Aug 9, 2024

Escaped < characters (&lt;) are processed incorrectly #191

Escaped < characters (&lt;) are processed incorrectly #191

Comments

chrispy-snps commented Jun 7, 2024 • edited Loading

codingjerk commented Jul 4, 2024

Rongronggg9 commented Aug 9, 2024

Escaped `<` characters (`<`) are processed incorrectly #191

Escaped `<` characters (`<`) are processed incorrectly #191

chrispy-snps commented Jun 7, 2024 •

edited

Loading