See the release notes for information about changes made in v2.0.
This library recursively calls needle's .get
method as long as the user-provided next()
function returns a string (the next url to get). See an example.
Example
const request = require('{%= name %}');
request(url, options, next)
.then(acc => console.log(acc.pages.length))
.catch(console.error);
url
{string} - (required) the initial url to getoptions
{object} - (optional) options object to pass to [needle][]next
{function} - (required) function that returns the next url to get, a promise or undefined.
url
{string} - the original (base) user-provided urlresp
{object} - [needle][] response objectacc
{object} - accumulator object with the following properties:options
{object} - user-provided options objectpages
{array} - array of responsesurls
{array} - array of requested urls
The next
function should return a string (the next url to get), promise or undefined.
The following example shows how to loop over pages of CSS
posts on smashingmagazine.com (an arbitrary example, but they have great content!).
const request = require('{%= name %}');
async function next(url, resp, acc) {
// do stuff to check response first if necessary
const regex = /href="\/.*?\/(\d+)\/"/;
const num = (regex.exec(resp.data) || [])[1];
if (num && /^[0-9]+$/.test(num) && +num <= n) {
// use the "original" url to avoid having to reparse
// and recreate the url each time
return `${acc.orig}/page/${num}/`;
}
}
request('https://www.smashingmagazine.com/category/css', {}, next)
.then(acc => console.log(acc.pages.length))
.catch(console.error);
- renamed
.hrefs
to.urls
in response object - now using [axios][] instead of [needle][]. Please see the axios documentation for API information.