Useful Python Scripts for Texts

License and Info

This collection of "Useful Python Scripts for Texts" (UPST) was originally constructed for teaching students in an honors academic writing course. Like a lot of open source software, it has two characteristics:

it is free to share with others, and
it is indebted to more people than I can thank here.

The latter is especially true since I am entirely self-taught when it comes to coding in Python and each of these scripts is the product of a lot of scrutinizing of other scripts and copying of code until I understood how things worked and could write it myself.

As a way to thank my many teachers, I have tried here to comment the scripts as thoroughly as I could, in the hope that I can help others in the same way I was helped.

All of this work is hereby in the public domain.

Setup

Copy settings_example.cfg to settings.cfg and set full_text to the right path for your environment.
Run conda env create -f environment.yml to make sure you have all the modules you'll need.

If this is your first time using nltk, you'll probably need to download additional info. See the NLTK docs for info.

Usage

In order to make these scripts as easy to use as possible, they are designed to be run from the command line. In the case of the latter, output can be captured from stdout by simply sending the results to a text file of the user's own naming:

python pythonscript.py > output.txt

Yup, it's just that easy.

Recent changes

updated to use Python 3.5 (anything lower is no longer supported)
fixed dispersions.py so that it displays graphs for a single word

Changes from johnlaudun/upst

I’m working my way through each file editing them to use main() functions and if __name__ == "__main__": main() calls. StackOverflow has a few good posts about why to do this. See What does if __name__ == “__main__”: do?, for instance. The gist is that declaring and then calling a main function separates the functions from the code that should execute. It also means that stuff in the main() function happens only when you call it as a standalone script (i.e., not when you use it in other programs).
I’m dividing the scripts into sets of functions generally. Why? Functions run faster. Again, StackOverflow has more info on why. Also, functions are cleaner than scripts and can be used in other programs. If you’ve looked at any of my older code on GitHub, you know I used to write straight scripts all the time too. I’ve seen the light.
I’m changing the way stats.py counts lines, paragraphs, words, etc. to accommodate Project Gutenberg texts. In Laudun’s original code, each line was a paragraph, but Project Gutenberg texts have blank lines between paragraphs and multiple lines with paragraphs.
I added a environment.yml file. Each script requires a different set of modules, and it was getting frustrating to have to interrupt my analysis workflow to install them. Now you can install them all at once as soon as you clone the repo. Then you can get to work.

Name		Name	Last commit message	Last commit date
Latest commit History 41 Commits
.ipynb_checkpoints		.ipynb_checkpoints
.gitignore		.gitignore
ReadMe.md		ReadMe.md
common-english-words-with-contractions.txt		common-english-words-with-contractions.txt
common-english-words.txt		common-english-words.txt
concordance.py		concordance.py
dispersions.py		dispersions.py
environment.yml		environment.yml
mdg.txt		mdg.txt
output.txt		output.txt
settings_example.cfg		settings_example.cfg
similar.py		similar.py
stats.py		stats.py
wordcloud-1.py		wordcloud-1.py
words.py		words.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Useful Python Scripts for Texts

License and Info

Setup

Usage

Recent changes

Changes from johnlaudun/upst

About

Releases

Packages

Languages

libbyh/upst

Folders and files

Latest commit

History

Repository files navigation

Useful Python Scripts for Texts

License and Info

Setup

Usage

Recent changes

Changes from johnlaudun/upst

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages