You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
HTMLQ "purifies" incorrect HTML, even when that isn't desirable.
Example input:
<h3class=subhead>Some Heading</h3>
When selecting the .subhead class as desired output, the heading is returned as:
<h3class="subhead">Some Heading</h3>
That's fine if you want to render in a browser, but if you're using the result as a search and replace pattern to awk, sed, or fsed, as I am, the pattern will fail to match because of the quotes which htmlq added.
In short, HTMLQ is re-constructing the HTML to be more spec-correct, and by doing so it is breaking character-for-character matches between otherwise unchanged parts of the throughput.
N.B. While it would still be a problem, I wouldn't care about this so much if #36 was implemented.
Maybe htmlq needs a --purify or --no-purify option?
The text was updated successfully, but these errors were encountered:
I'm using this tool to look at source documents, think sloppy html in the 10s or 100s of thousands of characters with no white space, line breaks or indentation, in order to figure out the structure and extract contents in a reasonable way. More than once now I'm pulling my hair out -- why can't I find the tbody elem -- only to find out these aren't in the source.
HTMLQ "purifies" incorrect HTML, even when that isn't desirable.
Example input:
When selecting the
.subhead
class as desired output, the heading is returned as:That's fine if you want to render in a browser, but if you're using the result as a search and replace pattern to awk, sed, or fsed, as I am, the pattern will fail to match because of the quotes which htmlq added.
In short, HTMLQ is re-constructing the HTML to be more spec-correct, and by doing so it is breaking character-for-character matches between otherwise unchanged parts of the throughput.
N.B. While it would still be a problem, I wouldn't care about this so much if #36 was implemented.
Maybe htmlq needs a --purify or --no-purify option?
The text was updated successfully, but these errors were encountered: