You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
When I tried to load a page from https://www.jamieoliver.com/ by HtmlWeb.Load method, it failed with an ArgumentException.
It turned out to be because the response headers from the site has content-encoding: identity. As per HTTP RFC 2616, identity is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header., so that it is of course that Encoding class does not support identity.
Therefore, next, I specified Encoding.UTF8 to OverrideEncoding property and called HtmlDocument.Load method. However, it didn't make any change and I got the same ArgumentException.
I expected OverrideEncoding property make HtmlWeb class to ignore the Content-Encoding in the response headers from server and to decode content by specified encoding in OverrideEncoding property, but it was not the case.
While it allows overriding the encoding specified by server when the encoding name is valid, it would be ideal that it also worked when the server specified encoding name is invalid.
Exception
Exception message:
System.ArgumentException : 'identity' is not a supported encoding name. For information on defining a custom encoding, see the documentation for the Encoding.RegisterProvider method.
Parameter name: name
Stack trace:
at System.Text.EncodingTable.GetCodePageFromName(String name)
at System.Text.Encoding.GetEncoding(String name)
at HtmlAgilityPack.HtmlWeb.Get(Uri uri, String method, String path, HtmlDocument doc, IWebProxy proxy, ICredentials creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1680
at HtmlAgilityPack.HtmlWeb.LoadUrl(Uri uri, String method, WebProxy proxy, NetworkCredential creds) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 2068
at HtmlAgilityPack.HtmlWeb.Load(Uri uri, String method) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1290
at HtmlAgilityPack.HtmlWeb.Load(Uri uri) in /Users/yas/Projects/happyfl/html-agility-pack/src/HtmlAgilityPack.Shared/HtmlWeb.cs:line 1189
at HappyFL.Services.WebSeekers.RecipeSeeker.Scan() in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekers/RecipeSeeker.cs:line 34
at HappyFL.Services.WebSeekerService.FindRecipes(Uri url, Nullable`1 cancel, Encoding encode) in /Users/yas/Projects/happyfl/HappyFL/Services/WebSeekerService.cs:line 159
at HappyFL.Test.WebSeekerServiceTest.TestFindRecipe(String url, ExpectedResultForTestFindRecipe expected) in /Users/yas/Projects/happyfl/HappyFLTest/WebSeekerServiceTest.cs:line 167
This issue is effecting a large number of websites (facebook.com being another example). While wrapping my code in try { } catch{ } does the trick, it is not ideal.
Description
When I tried to load a page from https://www.jamieoliver.com/ by HtmlWeb.Load method, it failed with an ArgumentException.
It turned out to be because the response headers from the site has
content-encoding: identity
. As per HTTP RFC 2616,identity
isused only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header.
, so that it is of course that Encoding class does not supportidentity
.Therefore, next, I specified Encoding.UTF8 to OverrideEncoding property and called HtmlDocument.Load method. However, it didn't make any change and I got the same ArgumentException.
I expected OverrideEncoding property make HtmlWeb class to ignore the Content-Encoding in the response headers from server and to decode content by specified encoding in OverrideEncoding property, but it was not the case.
While it allows overriding the encoding specified by server when the encoding name is valid, it would be ideal that it also worked when the server specified encoding name is invalid.
Exception
Project to reproduce issue
https://github.com/y-code/repro-bug-in-html-agility-pack
Further technical details
The text was updated successfully, but these errors were encountered: