scrpr

scrpr is a lightweight scraper multitool. it can fetch data via https, detect changes and parse the most common formats.

Usage Example

const scrpr = require("scrpr");

const scraper = scrpr({
	concurrency: 5,
	cachedir: '/tmp/scraper-cache',
});


scraper("https://example.org/data.csv", { 
	parse: "csv", 
}, function(err, change, data){

	if (err) console.error(err);
	if (change) console.log(data);
	
});

`scrpr(opts)` → function scraper

Constructor, returns scraper function

Opts:

concurrency — number of parallel requests; default: 1
cachedir — directory to save cache data in; default: <root module>/.scrpr-cache

`scraper([url], [opts], [callback(err, change, data)])`

Scraper, delivers data

Opts:

method — http method; default: get
url — URL, alternative to url parameter
headers — additional http request headers, default: {}
data — http data to be sent, default: null
cache — use cache, default: true
cacheid — override cache id, default: hash(url, opts)
parse — format to parse, default: null (raw data)
successCodes — array of http status codes considered successful, default: [ 200 ]
needle — options passed on to needle, default {}
xlsx — options passed on to xlsx, default {}
xsv — options passed on to xsv, default {}
pdf — options passed on to pdf.js-extract, default {}
preprocess(data, callback(err, data)) — modify data before parsing
postprocess(data, callback(err, data)) — modify data after parsing
stream — deliver data as ReadableStream — no parsing or processing, default: false
metaredirects — follow <meta http-equiv="refresh"> style redirects, default: false
iconv — decode stream or data as this charset with iconv-lite before parsing, default: false
cooldown — microseconds since last fetch before a resource is fetched again, default: false
sizechange — treat unchanged content-length as same file, default: false

Callback:

err — contains Error or null
change — true if data changed
data — raw or parsed data when changed, otherwise status string

Parsers

csv — Comma Seperated Values; data is an Object, parsed with xsv
tsv — Tab Separated Values; data is an Object, parsed with xsv
ssv — Semicolon Separated Values (data has been exported "as csv" with some localizations of Microsoft Excel): data is an Object, parsed with xsv
xml — eXtensible Markup Language; data is an Object, parsed with xml2js
json — JavaScript object Notation; data is an Object, parsed natively
html — HyperText Markup Language; data is an instance of cheerio
yaml — YAML Ain't Markup Language; data is an Object, parsed with yaml
xlsx — Office Open XML Workbook; data is an Object, parsed with xlsx; { "<sheetname>": [ [ cell, cell, cell, ... ], ... ] }
pdf — Portable Document Format; data is an Object, parsed with pdf.js-extract;
kdl — KDL Document Language; data is an Object, parsed with kdljs;
dw — Datawrapper Visualisation; data is an Object, extracted with dataunwrapper;

FTP

Rudimentary handling for ftp URLs is available if the optional get-uri dependency is installed.

Local Files

Rudimentary handling for local files is available with the file:/ pseude-protocol.

Optional dependencies

xsv, xlsx, xml2js, yaml, cheerio, dataunwrapper, iconv-lite, kdljs, pdf.js-extract and get-uri are optional dependencies. They should only be installed if their use is required.

License

UNLICENSE

Name		Name	Last commit message	Last commit date
Latest commit History 115 Commits
.gitignore		.gitignore
UNLICENSE		UNLICENSE
package-lock.json		package-lock.json
package.json		package.json
readme.md		readme.md
scrpr.js		scrpr.js

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

scrpr

Usage Example

`scrpr(opts)` → function scraper

`scraper([url], [opts], [callback(err, change, data)])`

Parsers

FTP

Local Files

Optional dependencies

License

About

Releases

Packages

Contributors 2

Languages

License

yetzt/node-scrpr

Folders and files

Latest commit

History

Repository files navigation

scrpr

Usage Example

scrpr(opts) → function scraper

scraper([url], [opts], [callback(err, change, data)])

Parsers

FTP

Local Files

Optional dependencies

License

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

`scrpr(opts)` → function scraper

`scraper([url], [opts], [callback(err, change, data)])`

Packages