hackernews-scraper

Scrape hacker news comments and posts using the Algolia API.

Usage

from hackernews-scraper import CommentScraper

CommentScraper.getComments(since=1394039447)

The above will return a generator that will yield one comment at a time. It will keep on going until there are no more comments to fetch, or until it reaches the 50 pages limit set by hacker news. In the latter case, a TooManyItemsException will be raised.

If the hacker news API response is missing any required fields, the scraper will raise KeyError.

Response format

Comments:

{
 'author': u'dhmholley',
 'comment_id': u'7531026',
 'comment_text': u'Are people still blowing this whistle?...',
 'created_at': u'2014-04-04T12:57:38.000Z',
 'parent_id': 7530853,
 'points': 1,
 'story_id': None,
 'story_title': None,
 'story_url': None,
 'timestamp': 1396616258,
 'title': None,
 'url': None
}

Stories:

{
 'author': u'sethco',
 'created_at': u'2014-04-04T12:56:23.000Z',
 'objectID': None,
 'points': 1,
 'story_text': 1,
 'timestamp': 1396616183,
 'title': u'Opower IPO today',
 'url': u'http://www.businesswire.com/news/home/20140403006541/en#.Uz4cbq1dVih'
}

Testing

You need to have httpretty and factory-boy installed.

Run nosetests in the root folder or the tests folder.

Name		Name	Last commit message	Last commit date
Latest commit History 94 Commits
hackernews_scraper		hackernews_scraper
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

hackernews-scraper

Usage

Response format

Testing

About

Uh oh!

Releases

Packages

Languages

License

palcu/hackernews-scraper

Folders and files

Latest commit

History

Repository files navigation

hackernews-scraper

Usage

Response format

Testing

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages