Skip to content
/ wpdl Public

⬇️ Scrape pages, posts, images and other data from a WordPress instance.

License

Notifications You must be signed in to change notification settings

jtiala/wpdl

Repository files navigation

wpdl

License Release npm Conventional Commits CI

Scrape pages, posts, images and other data from a WordPress instance using the WordPress REST API. Use simple command line arguments to clean up the scraped data.

Screenshot of example usage of the tool in a terminal emulator.

Pre-requisites

Node.js v19 or newer (for native fetch support).

Usage examples

The following commands use the latest version of wpdl that is published in npm. To run the script locally, clone this repo and replace npx wpdl with npx ..

Scrape pages and posts

npx wpdl --url https://your-wp-instance.com --pages --posts

Scrape pages and clean up the html by filtering out all img elements and elements with the class foo. Also remove all elements without text content. From the json files, remove all the Jetpack and Yoast SEO data.

npx wpdl --url https://your-wp-instance.com --pages --elementFilter img --classFilter foo --jsonFilter "jetpack_*" --jsonFilter "yoast_*" --removeEmptyElements

To see full usage, run

npx wpdl -h