-
Notifications
You must be signed in to change notification settings - Fork 159
Jsoup Extension
Dmitriy Zayceff edited this page Apr 24, 2015
·
13 revisions
- Since: 0.6.4
- Dependency:
org.develnext:jphp-jsoup-ext
- API: http://jphp-docs.readthedocs.org/en/latest/api_en/php/jsoup/
Jsoup is a library for working with real-world HTML. It provides a very convenient API for extracting and manipulating data, using the best of DOM, CSS, and jquery-like methods. It based on the jsoup java library jsoup.org
Getting and parsing text of url:
use php\jsoup\Jsoup;
$doc = Jsoup::connect("http://en.wikipedia.org/")->get();
$newsHeadlines = $doc->select("#mp-itn b a");
foreach ($newsHeadlines as $element) {
echo "- {$element->text()}\n";
}
Use the static Jsoup::parseText(string $html)
method, or Jsoup::parseText(string $html, string $baseUri)
if the page came from the web.
$html = "<html><head><title>First parse</title></head>
<body><p>Parsed HTML into a doc.</p></body></html>";
$doc = Jsoup::parseText($html);
echo $doc->title();
Load document form an url with parameters:
$doc = Jsoup::connect("http://example.com")
->data("query", "Java")
->userAgent("Mozilla")
->cookie("auth", "token")
->timeout(3000) // 3 sec
->post();
Load a Document from a File:
$doc = Jsoup::parse('path/to/file.html', 'UTF-8', 'http://mysite.com/');
Use DOM methods to navigate a document:
$input = "/tmp/input.html";
$doc = Jsoup::parse($input, "UTF-8", "http://example.com/");
$content = $doc->select("#content");
$links = $content->select("a");
foreach ($links as $link) {
$linkHref = $link->attr("href");
$linkText = $link->text();
}
Use selector-syntax to find elements:
$input = "/tmp/input.html";
$doc = Jsoup::parse($input, "UTF-8", "http://example.com/");
$links = $doc->select("a[href]"); // a with href
$pngs = $doc->select("img[src$=.png]"); // img with src ending .png
$masthead = $doc->select("div.masthead")->first(); // div with class=masthead
$resultLinks = $doc->select("h3.r > a"); // direct a after h3
JPHP Group 2015