Sentence Cow 🐄

Sentence Cow is a Flask-based web application that extracts sentences from a text and counts the number of words in each sentence, as best as possible. Any string of alphanumeric characters that ends with a terminating punctuation mark such as . ! ? and the like is considered to be a sentence, regardless of grammar or syntax.

Sentence Cow doesn't use any NLP algorithms to detect sentences–just a lot regular expressions and checking.

End-users can use tools to 'split' or 'merge' sentences where the application fails to extract a sentence in a way that they might expect.

A live demo of the project is available here.

Setting up the development environment

It is highly recommended that you are in a virtual environment before doing any of the below.

After cloning the repository, perhaps the easiest way to get Sentence Cow's dependencies is to type the following on the console:

pip install -r requirements.txt

This will install everything you need, including the nose testing framework.

To run the tests, simply make sure you are in the project's root folder and then on the console,

nosetests

Alternatively, you can rely on setup.py to download the latest versions of the dependencies via

pip install -e .

The testing framework won't be there, however. You'll have to download it separately:

pip install nose

Running Sentence Cow

From the project root, type

python sentencecow/app.py

You can then open your favourite browser to http://localhost:5000/sentencecow the program in action.

Updating `abbreviations.txt`

Sentence Cow relies on a data file to handle sentences that contain abbreviations that end with a period (e.g. 'Mr.', 'etc.', 'i.e.'). Essentially, the program will 'skip' any abbreviation listed in data/abbreviations.txt. That is, a listed abbreviation won't be taken as the end of a sentence, except at the end of a text.

A separate script, abbrevscrape, is used to generate abbreviations.txt, whose repository can be found here. Please read the instructions about running the script and editing the text file.

Should you wish to replace abbreviations.txt with an updated version, simply copy the new file to the data folder. It's always a good idea to backup the old abbreviations file in case there's something wonky with the new one.

Feedback

Any positive, constructive feedback is welcome. I am a novice programmer who knows he has many, many things to learn.

Name		Name	Last commit message	Last commit date
Latest commit History 288 Commits
docs		docs
sentencecow		sentencecow
tests		tests
.gitignore		.gitignore
LICENSE.txt		LICENSE.txt
MANIFEST.in		MANIFEST.in
README.md		README.md
requirements.txt		requirements.txt
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Sentence Cow 🐄

Setting up the development environment

Running Sentence Cow

Updating `abbreviations.txt`

Feedback

About

Releases

Packages

Languages

License

dunnesquared/sentencecow

Folders and files

Latest commit

History

Repository files navigation

Sentence Cow 🐄

Setting up the development environment

Running Sentence Cow

Updating abbreviations.txt

Feedback

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Updating `abbreviations.txt`

Packages