Skip to content

option ignoreWhitespace #90

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
eGavr opened this issue Jul 23, 2014 · 3 comments
Closed

option ignoreWhitespace #90

eGavr opened this issue Jul 23, 2014 · 3 comments

Comments

@eGavr
Copy link

eGavr commented Jul 23, 2014

Is it possible to make this option work in the same way as in node-htmlparser!?

@fb55
Copy link
Owner

fb55 commented Jul 23, 2014

It's hard to do this right when streaming. Also, I don't really see the point of it (the meaning of whitespace depends on the location in the document).

@eGavr
Copy link
Author

eGavr commented Jul 23, 2014

In general, the idea of this option is to ignore spaces, tabs etc in those cases, when there are situated not in tag's attributes or in tag's content, but between tags.

For example, if these option is true

This code

<font>
    <br>this is the text
<font>

should be parsed as

<font><br>this is the text<font>

but now it is parsed as

<font> <br>this is the text <font> </font></font>

@fb55
Copy link
Owner

fb55 commented Jul 24, 2014

As an example why this is a bad idea:

<p>foo<b> <i>bar</i></b></p>

is usually rendered as

foo bar

but with the change you're proposing, it becomes

<p>foo<b><i>bar</i></b></p>

which is rendered as

foobar

(Note the missing whitespace.)

The current behavior of normalizing whitespace leads to problems in <pre>-tags, which is still bad enough, but won't blow up your document on a regular basis.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants