-
-
Notifications
You must be signed in to change notification settings - Fork 905
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
additional meta Content-Type is added to HTML5 #1008
Comments
It looks like the following commit (86d1bfb) might have fixed the issue. Can you still reproduce this on master? |
I have same issue. require "nokogiri"
doc = Nokogiri::HTML::Document.parse <<-EOHTML
<!DOCTYPE html>
<html>
<head>
<title>Test</title>
<meta charset="UTF-8">
</head>
<body>
</body>
</html>
EOHTML
puts doc.to_s result <!DOCTYPE html>
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>Test</title>
<meta charset="UTF-8">
</head>
<body>
</body>
</html> my environment
|
Also having this issue. |
I also have the same issue. |
rgrove
added a commit
to rgrove/sanitize
that referenced
this issue
May 20, 2014
The version of libxml2 used by Nokogiri forcibly adds a content-type meta tag to all documents with a <head> element during serialization, which is stupid. See also: sparklemotion/nokogiri#1008
CaseyLeask
added a commit
to CaseyLeask/developers.whatwg.org
that referenced
this issue
Jan 16, 2016
We need to remove the extra charset specification, since we're adding our own from html/head.html. We can't remove the <meta http-equiv="Content-Type">, since there's a nokogiri bug coming from libxml2 that just re-adds it, even when we specify a valid <meta charset> sparklemotion/nokogiri#1008
This was referenced Jul 20, 2016
Any news on this? |
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
`:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
`:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
`:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
`:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
`:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
`:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008
mysociety-pusher
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 3, 2017
We can't use `#sanitize` here because it operates on a Loofah fragment instead of a loofah document [1]. This results in the `<head>` and `<body>` tags getting stripped and returning an invalid HTML page. With Loofah's built in `:prune` scrubber we retain the old behaviour of stripping out script tags. `:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008 [1] https://github.com/flavorjones/loofah#side-note-fragments-vs-documents
lizconlan
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 4, 2017
We can't use `#sanitize` here because it operates on a Loofah fragment instead of a loofah document [1]. This results in the `<head>` and `<body>` tags getting stripped and returning an invalid HTML page. With Loofah's built in `:prune` scrubber we retain the old behaviour of stripping out script tags. `:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008 [1] https://github.com/flavorjones/loofah#side-note-fragments-vs-documents
lizconlan
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 4, 2017
We can't use `#sanitize` here because it operates on a Loofah fragment instead of a loofah document [1]. This results in the `<head>` and `<body>` tags getting stripped and returning an invalid HTML page. With Loofah's built in `:prune` scrubber we retain the old behaviour of stripping out script tags. `:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008 [1] https://github.com/flavorjones/loofah#side-note-fragments-vs-documents
lizconlan
pushed a commit
to mysociety/alaveteli
that referenced
this issue
Aug 4, 2017
We can't use `#sanitize` here because it operates on a Loofah fragment instead of a loofah document [1]. This results in the `<head>` and `<body>` tags getting stripped and returning an invalid HTML page. With Loofah's built in `:prune` scrubber we retain the old behaviour of stripping out script tags. `:prune` removes unknown/unsafe tags and their contents (including their subtrees): unsafe_html = "ohai! <div>div is safe</div> <foo>but foo is <b>not</b></foo>" Loofah.fragment(unsafe_html).scrub!(:prune) # => "ohai! <div>div is safe</div> " * Adds a `DOCTYPE` to the fixture file so that Nokogiri doesn't insert a HTML 4 `DOCTYPE` automatically, making comparison in the spec uglier * Nokogiri also adds a `meta` tag to the output. Not much we can do about this: sparklemotion/nokogiri#1008 [1] https://github.com/flavorjones/loofah#side-note-fragments-vs-documents
adunkman
added a commit
to adunkman/dctech.tv
that referenced
this issue
Dec 9, 2017
This is a little gross; the tag is forcibly added by nokogiri now: sparklemotion/nokogiri#1008 If it’s going to be forcibly added, at least it should be indented properly. :'( Let’s add it to the layout for now, but if we can remove it in the future and just rely on the modern standard (<meta charset="utf-8">), let’s do it!
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Hello guys,
I have the following HTML file test.html.
I am opening the file with Nokogiri and then write it back to another file.
The output file will contain
meta http-equiv="Content-Type" content="text/html; charset=UTF-8"
although the meta charset for HTML5 is set.Any idea how I could fix it not to add that meta tag which is for HTML4 and not for HTML5?
I am using nokogiri-1.6.0.
Any help is appreciated. Thanks!
The text was updated successfully, but these errors were encountered: