Skip to content
forked from tcr/skim

Scrape websites simply in Node.js. Streaming HTML parser combined with a flexible HTTP client.

Notifications You must be signed in to change notification settings

harshabhat86/scrapi

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

scrapi

Website scraping in node.js!

(I don't encourage violating the TOS of a target site that prohibits scraping.)


Define your scraping parameters in a JSON manifest:

var manifest = {
  "base": "http://news.ycombinator.com/",
  "spec": {
    "/": {
      "$query": "td.title ~ td ~ td.title > a",
      "$each": {
        "title": "(text)",
        "link": "(attr href)"
      }
    }
  }
};

Create your API:

var api = scrapi(manifest);
api('/').get(function (err, json) {
  console.log(json);
})

Result:

[ { link: 'https://www.hackerschool.com/blog/5-learning-c-with-gdb',
    title: 'Learning C with gdb' },
  { link: 'http://blogs.scientificamerican.com/guest-blog/2012/08/27/the-hidden-truths-about-calories/',
    title: 'Hidden Truths about Calories' },
  { link: 'http://cantada.ca/',
    title: 'Can\'tada - Tracking the stuff you can\'t use in Canada' },
  { link: 'https://blog.gregbrockman.com/2012/08/system-design-stripe-capture-the-flag/',
    title: 'Seccuring Stripe\'s Capture the Flag' },
  { link: 'http://swanson.github.com/blog/2012/08/27/move-your-feet.html',
    title: 'Move your feet' },
    ... ]

About

Scrape websites simply in Node.js. Streaming HTML parser combined with a flexible HTTP client.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published