Skip to content
This repository has been archived by the owner on Jan 15, 2022. It is now read-only.
/ gather Public archive

Gathering content from URLs for CMS import.

License

Notifications You must be signed in to change notification settings

kanopi/gather

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Gather: making a content bouquet.

Gather ye rosebuds while ye may, Old Time is still a-flying; And this same flower that smiles today To-morrow will be dying. - Robert Herrick

Overview

This utility takes a CSV that is formatted like an XML sitemap, crawls the URLs, and outputs a parsed CSV that has the content prepped for CMS import.

Gather was born from the need to create content that could be imported to a CMS as part of a web redesign. There was no way to get the content except by scraping the site.

Using a sitemap generator, we were able to get a list of URLs to crawl, parse, and port to a format we could then use with a pre-built CMS importer.

Requirements

Node and NPM. That's it.

Installation & Setup

Clone this repository. Then do the following (assuming you cloned to the directory gather):

$ cd gather
$ npm install

Getting Ready

You need two files to run this script: a source CSV with a column named URL at minimum, and a configured YAML file. Take a look at sample.yml in the repository for an example configuration.

All paths in the YAML file are relative to the location of the main gather.js script.

Ready... set... go!

Run node gather, give it the path to your YAML file, and watch the magic happen.

If you like the shortest route possible, you can use node gather -y path/to/sample.yml instead.

The script will output a [post-type]-content.csv file in the script directory if it runs successfully.

About

Gathering content from URLs for CMS import.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published