Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistent comment rendering #525

Closed
mems opened this issue Dec 7, 2023 · 6 comments
Closed

Inconsistent comment rendering #525

mems opened this issue Dec 7, 2023 · 6 comments
Assignees

Comments

@mems
Copy link

mems commented Dec 7, 2023

1. Description

It's not clear if HtmlDocument.CreateComment() need to include <!-- and -->. If it's included, OuterHtml include it twice.
If it's not, WriteTo() with OptionOutputAsXml = true could raise an exception.

Note: the HTML Standard indicate HTML comments can't contains some sequences:

text [inside comments] must not start with the string ">", nor start with the string "->", nor contain the strings "", or "--!>", nor end with the string "<!-"

2. Exception

[System.ArgumentOutOfRangeException: Length cannot be less than zero.
Parameter name: length]
   at System.String.Substring(Int32 startIndex, Int32 length)
   at HtmlAgilityPack.HtmlNode.GetXmlComment(HtmlCommentNode comment)
   at HtmlAgilityPack.HtmlNode.WriteTo(TextWriter outText, Int32 level)
   at HtmlAgilityPack.HtmlNode.WriteTo()
   at Program.Test(Boolean outputAsXml) :line 21
   at Program.Main() :line 10

3. Fiddle or Project

A Fiddle that reproduce the issue: https://dotnetfiddle.net/vARPlD

// @nuget: HtmlAgilityPack -Version 1.11.54
using System;
using HtmlAgilityPack;
					
public class Program
{
	public static void Main()
	{
		Test();
		Test(true);
	}
	
	static void Test(bool outputAsXml = false)
	{
		Console.WriteLine(string.Format("---- Test with outputAsXml = {0} ----", outputAsXml));
		var doc = new HtmlDocument();
		doc.OptionOutputAsXml = outputAsXml;
		doc.LoadHtml("<!-- abc -->");
		var comment = doc.CreateComment(" cde ");
		Console.WriteLine(string.Format("comment.OuterHtml = {0}", comment.OuterHtml));
		Console.WriteLine(string.Format("comment.WriteTo() = {0}", comment.WriteTo()));
		doc.DocumentNode.AppendChild(comment);
		var comment2 = doc.CreateComment("<!-- fgh -->");
		Console.WriteLine(string.Format("comment2.OuterHtml = {0}", comment2.OuterHtml));
		Console.WriteLine(string.Format("commen2t.WriteTo() = {0}", comment2.WriteTo()));
		doc.DocumentNode.AppendChild(comment2);
		Console.WriteLine(string.Format("doc.DocumentNode.OuterHtml = {0}", doc.DocumentNode.OuterHtml));
		Console.WriteLine(string.Format("doc.DocumentNode.WriteTo() = {0}", doc.DocumentNode.WriteTo()));
	}
}

Will outuput:

---- Test with outputAsXml = False ----
comment.OuterHtml = <!-- cde -->
comment.WriteTo() =  cde 
comment2.OuterHtml = <!--<!-- fgh -->-->
commen2t.WriteTo() = <!-- fgh -->
doc.DocumentNode.OuterHtml = <!-- abc --> cde <!-- fgh -->
doc.DocumentNode.WriteTo() = <!-- abc --> cde <!-- fgh -->
---- Test with outputAsXml = True ----
comment.OuterHtml = <!-- cde -->
[Run-time exception]
@JonathanMagnan JonathanMagnan self-assigned this Dec 7, 2023
@JonathanMagnan
Copy link
Member

Hello @mems ,

Thank you for reporting. Indeed something doesn't work in this logic.

We will look at it.

Best Regards,

Jon

@JonathanMagnan
Copy link
Member

Hello @mems ,

We have released today the v1.11.55: https://github.com/zzzprojects/html-agility-pack/releases/tag/v1.11.55

You can see this commit that will explain what we did:

In this version:

  • If a comment is provided without being enclosed with <-- and -->, we will add it to the comment (with a space before and after. So doc.CreateComment(" cde "); will create the following comment: <-- cde --> as you explicitly specified to add a additional space before and after the text.
  • To be somewhat backward compatible, if the comment stored is already <-- and -->, we will no longer enclose it with an extra <!-- and -->.
  • When writing a comment with the WriteTo method, we will no longer add an extra space at the end before the -->

You can see the following Fiddle with your case on this new version: https://dotnetfiddle.net/6LvV3S

Which has the following output:

---- Test with outputAsXml = False ----
comment.OuterHtml = <!--  cde  -->
comment.WriteTo() = <!--  cde  -->
comment2.OuterHtml = <!-- fgh -->
commen2t.WriteTo() = <!-- fgh -->
doc.DocumentNode.OuterHtml = <!-- abc --><!--  cde  --><!-- fgh -->
doc.DocumentNode.WriteTo() = <!-- abc --><!--  cde  --><!-- fgh -->
---- Test with outputAsXml = True ----
comment.OuterHtml = <!--  cde  -->
comment.WriteTo() = <!--  cde  -->
comment2.OuterHtml = <!-- fgh -->
commen2t.WriteTo() = <!-- fgh -->
doc.DocumentNode.OuterHtml = <?xml version="1.0" encoding="iso-8859-1"?><span><!-- abc --><!--  cde  --><!-- fgh --></span>
doc.DocumentNode.WriteTo() = <?xml version="1.0" encoding="iso-8859-1"?><span><!-- abc --><!--  cde  --><!-- fgh --></span>

Let me know if we missing something.

Best Regards,

Jon

@elgonzo
Copy link
Contributor

elgonzo commented Dec 14, 2023

@JonathanMagnan,

please note that the code in HtmlDocument.cs adds surrounding spaces to the comment, something it should not do. Instead of "<!-- " and " -->", it should use "<!--" and "-->" without the spaces. (The code in HtmlNode.cs and HtmlCommentNode.cs appear to do it correctly.)

EDIT: I made a pull request for this: #528

@JonathanMagnan
Copy link
Member

Thank you @elgonzo ,

I understand your point.

Adding a space before and after is usually a good practice. It is our responsibility? Probably not and like @mems did in his example, he added the space on his side.

So we will probably go ahead with your pull ;)

@mems
Copy link
Author

mems commented Dec 18, 2023

Since there is a release version that fix my issue, I close it now.

Thank you @JonathanMagnan

@mems mems closed this as completed Dec 18, 2023
@JonathanMagnan
Copy link
Member

Hello @mems ,

Just to let you know that the merge from @elgonzo has been merged (thank again!). So, no additional space will be added anymore when creating the comment tag manually.

Best Regards,

Jon

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Development

No branches or pull requests

3 participants