Nokogiri

Description

Nokogiri (鋸) is an HTML, XML, SAX, and Reader parser. Among Nokogiri's many features is the ability to search documents via XPath or CSS3 selectors.

Links

Status

Features

XML/HTML DOM parser which handles broken HTML
XML/HTML SAX parser
XML/HTML Push parser
XPath 1.0 support for document searching
CSS3 selector support for document searching
XML/HTML builder
XSLT transformer

Nokogiri parses and searches XML/HTML using native libraries (either C or Java, depending on your Ruby), which means it's fast and standards-compliant.

Installation

If this doesn't work:

gem install nokogiri

then please start troubleshooting here:

https://nokogiri.org/tutorials/installing_nokogiri.html

There are currently 1,237 Stack Overflow questions about Nokogiri installation. The vast majority of them are out of date and therefore incorrect. Please do not use Stack Overflow.

Instead, tell us when the above instructions don't work for you. This allows us to both help you directly and improve the documentation.

Binary packages

Binary packages are available for some distributions.

Support

All official documentation is posted at https://nokogiri.org (the source for which is at https://github.com/sparklemotion/nokogiri.org/, and we welcome contributions).

The Nokogiri mailing list is active: https://groups.google.com/group/nokogiri-talk
The Nokogiri bug tracker is here: https://github.com/sparklemotion/nokogiri/issues
Before filing a bug report, please read our submission guidelines: http://nokogiri.org/tutorials/getting_help.html
The IRC channel is #nokogiri on freenode.
The project's GitHub wiki has an excellent community-maintained Cheat Sheet which might be useful.

Consider subscribing to Tidelift which provides license assurances and timely security notifications for your open source dependencies, including Nokogiri. Tidelift subscriptions also help the Nokogiri maintainers fund our automated testing which in turn allows us to ship releases, bugfixes, and security updates more often.

Security and Vulnerability Reporting

Please report vulnerabilities at https://hackerone.com/nokogiri

Full information and description of our security policy is in SECURITY.md

Synopsis

Nokogiri is a large library, but here is example usage for parsing and examining a document:

#! /usr/bin/env ruby

require 'nokogiri'
require 'open-uri'

# Fetch and parse HTML document
doc = Nokogiri::HTML(open('https://nokogiri.org/tutorials/installing_nokogiri.html'))

puts "### Search for nodes by css"
doc.css('nav ul.menu li a', 'article h2').each do |link|
  puts link.content
end

puts "### Search for nodes by xpath"
doc.xpath('//nav//ul//li/a', '//article//h2').each do |link|
  puts link.content
end

puts "### Or mix and match."
doc.search('nav ul.menu li a', '//article//h2').each do |link|
  puts link.content
end

Requirements

Ruby 2.3.0 or higher, including any development packages necessary to compile native extensions.
In Nokogiri 1.6.0 and later libxml2 and libxslt are bundled with the gem, but if you want to use the system versions:
- First, check out the long list of fixes and changes between releases before deciding to use any version older than is bundled with Nokogiri.
- At install time, set the environment variable NOKOGIRI_USE_SYSTEM_LIBRARIES or else use the --use-system-libraries argument. (See https://nokogiri.org/tutorials/installing_nokogiri.html#install-with-system-libraries for specifics.)
- libxml2 >=2.6.21 with iconv support (libxml2-dev/-devel is also required)
- libxslt, built with and supported by the given libxml2 (libxslt-dev/-devel is also required)

Encoding

Strings are always stored as UTF-8 internally. Methods that return text values will always return UTF-8 encoded strings. Methods that return a string containing markup (like to_xml, to_html and inner_html) will return a string encoded like the source document.

WARNING

Some documents declare one encoding, but actually use a different one. In these cases, which encoding should the parser choose?

Data is just a stream of bytes. Humans add meaning to that stream. Any particular set of bytes could be valid characters in multiple encodings, so detecting encoding with 100% accuracy is not possible. libxml2 does its best, but it can't be right all the time.

If you want Nokogiri to handle the document encoding properly, your best bet is to explicitly set the encoding. Here is an example of explicitly setting the encoding to EUC-JP on the parser:

  doc = Nokogiri.XML('<foo><bar /></foo>', nil, 'EUC-JP')

Development

  bundle install
  bundle exec rake compile test

Code of Conduct

We've adopted the Contributor Covenant code of conduct, which you can read in full in CODE_OF_CONDUCT.md.

License

This project is licensed under the terms of the MIT license.

See this license at LICENSE.md.

Name		Name	Last commit message	Last commit date
Latest commit History 4,512 Commits
.github		.github
bin		bin
concourse		concourse
ext		ext
lib		lib
patches		patches
suppressions		suppressions
tasks		tasks
test		test
.autotest		.autotest
.cross_rubies		.cross_rubies
.editorconfig		.editorconfig
.gemtest		.gemtest
.gitignore		.gitignore
.hoerc		.hoerc
CHANGELOG.md		CHANGELOG.md
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
C_CODING_STYLE.rdoc		C_CODING_STYLE.rdoc
Gemfile		Gemfile
Gemfile-libxml-ruby		Gemfile-libxml-ruby
LICENSE-DEPENDENCIES.md		LICENSE-DEPENDENCIES.md
LICENSE.md		LICENSE.md
Manifest.txt		Manifest.txt
README.md		README.md
ROADMAP.md		ROADMAP.md
Rakefile		Rakefile
SECURITY.md		SECURITY.md
STANDARD_RESPONSES.md		STANDARD_RESPONSES.md
Y_U_NO_GEMSPEC.md		Y_U_NO_GEMSPEC.md
appveyor.yml		appveyor.yml
build_all		build_all
dependencies.yml		dependencies.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Licenses found

Repository files navigation

Nokogiri

Description

Links

Status

Features

Installation

Binary packages

Support

Security and Vulnerability Reporting

Synopsis

Requirements

Encoding

Development

Code of Conduct

License

About

Licenses found

Releases

Packages

Languages

License

Licenses found

headius/nokogiri

Folders and files

Latest commit

History

Repository files navigation

Nokogiri

Description

Links

Status

Features

Installation

Binary packages

Support

Security and Vulnerability Reporting

Synopsis

Requirements

Encoding

Development

Code of Conduct

License

About

Resources

License

Licenses found

Code of conduct

Security policy

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages