Skip to content

This is a simple tool that lets you search the top 100-ish Project Gutenberg ebooks for text.

License

Notifications You must be signed in to change notification settings

dariusk/gutencorpus

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

gutencorpus

This module contains the top 100-ish ebooks on Project Gutenberg, and lets you search them for all sentences containing a particular substring.

Getting Started

Install the module with: npm install gutencorpus

var gutencorpus = require('gutencorpus');
gutencorpus.search('misled', {caseSensitive: true})
  .done(function(result) {
    console.log(result);
  });

Here's an example of using search results to create new-ish sentences:

gutencorpus.search(' believe he ')
  .done(function(results) {
    var newResults = results.map(function(sentence) {
      return 'He\'s the type of guy who ' + sentence.replace(/.*believe he /,'');
    });
    console.log(newResults);
    /*
      [ 'He\'s the type of guy who _did_--I heard something about it--but I hardly know what--something about Mr.',
        'He\'s the type of guy who chiefly lived, but his studying the law was a mere pretence, and being now free from all restraint, his life was a life of idleness and dissipation.',
        'He\'s the type of guy who will ever live at Netherfield any more.',
        'He\'s the type of guy who could go any further so scared he hadn\'t hardly any strength left, he said.',
        ... ]
    */
  });

Documentation

The gutencorpus object contains one function: search, which returns a promise (using Underscore.Deferred, which follows JQuery's Deferred implementation).

Search takes an options object, which itself currently supports one option: caseSensitive. By default this is false (meaning your searches are case insensitive), but you can set it to true to get case-sensitive responses as in the first example above.

This is essentially a lightweight version of hugovk's gutengrep -- please check that out if you want a comprehensive implementation that works with the entire Gutenberg corpus (but also requires more setup and is in Python).

Contributing

In lieu of a formal styleguide, take care to maintain the existing coding style. Add unit tests for any new or changed functionality. Lint and test your code using Grunt.

License

Copyright (c) 2014 Darius Kazemi
Licensed under the MIT license.

About

This is a simple tool that lets you search the top 100-ish Project Gutenberg ebooks for text.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published