Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

The html rendering result is different from the html output result #529

Closed
colindcli opened this issue Dec 17, 2023 · 5 comments
Closed
Assignees

Comments

@colindcli
Copy link

When the p span does not close the tag, it is followed by the div tag.
The HAP output is: <p><span><span><div></div></span></span></p>
The Chrome output is: <p><span><span></span></span></p><div></div>

chrome

var html = @"<p><span><span><div></div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
var newHtml = doc.DocumentNode.OuterHtml;  //<p><span><span><div></div></span></span></p>
var res = @"<p><span><span></span></span></p><div></div>"; //chrome or edge rendering results
var b = newHtml == res; //false
@JonathanMagnan JonathanMagnan self-assigned this Dec 18, 2023
@JonathanMagnan
Copy link
Member

Hello @colindcli ,

Thank you for reporting. We will look at it.

Best Regards,

Jon

@JonathanMagnan
Copy link
Member

Hello @colindcli,

The v1.11.57 has been released

The "span" is not being closed before a "div" has been fixed." However, there is a slight difference with your result as the "p" currently has an implicit ending (before the "div") and not an explicit ending.

By default, the "p" tag behavior is really bad. I recommend you to use of of the 2 followings options:

HtmlDocument.DisableBehaviorTagP = true;

// or

doc.BackwardCompatibility = false;

Here is the current result with your code:

var html = @"<p><span><span><div></div>";
var doc = new HtmlDocument();
doc.LoadHtml(html);
//HtmlDocument.DisableBehaviorTagP = true;
doc.BackwardCompatibility = false;

var newHtml = doc.DocumentNode.OuterHtml;
var res = @"<p><span><span></span></span><div></div>";
var b = newHtml == res; // true

As mentioned, the "p" tag is not closed in this example as this is an implicit close, which is valid in HTML.

Adding an option to force an "Explicit" end could be possible when an "Implicit" end is found if you really require this.

Let me know if everything is working correctly for you now.

Best Regards,

Jon

@colindcli
Copy link
Author

@JonathanMagnan There is no problem at all in the test, very good!
Thank you so much!

@JonathanMagnan
Copy link
Member

JonathanMagnan commented Jan 29, 2024

Hello @colindcli ,

Unfortunately, we had to undo our change.

As a few people reported, our code was breaking some valid rules.

I tried hard to make it work with the right behavior (finding the parent recursively), but unfortunately, it was leading to some other potential issue that I could already easily find.

Making the fix correctly for this specific case is easy, but making the fix correctly to make it work in general (with more complex HTML) is just too complicated as too many rules are missing to make it work correctly.

Unfortunately, with all the time and effort I put tried to make it happen, I failed, and the only option possible is to abandon it or wait for someone that make a pull request that could fix it without impacting anything.

So, the change has been reverted in v1.11.58

Best Regards,

Jon

@colindcli
Copy link
Author

Hello @JonathanMagnan,

I'm very sorry because an update-wide error occurred due to a bug I mentioned. This is not what I want to see. If the impact is really large, version rollback is also acceptable.

Best Regards

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

2 participants