Wishful Thinking

Text analysis for similar book recommendations and automatic genre determination

Unsupervised machine learning

Data obtention

Project Gutenberg data list was obtained by running wget -w 2 -m http://www.gutenberg.org/robot/harvest?filetypes[]=txt&langs[]=en and cat www.gutenberg.org/robot/* | urlscan -d -n | sed '/-[0-9]/\.zip/ d' > urls Then, mkdir zip && cd zip && xargs -P 100 wget <urls gives the zipped text files in zip/ as we're using in our text analysis scripts.

License for our original code is Affero GPL. We make no such warranty about the license for data used by this program, like the Project Gutenberg corpus.

Manual data changes done through the corpus:

Removed zip/1126.zip # uhhhhhhhhhhhhhhhhhh
Removed zip/comed10.zip

Setup

Download the Gutenberg files as described above

python3 -m venv env
source env/bin/activate
pip install -r requirements.txt
python3 analyze.py
python3 run.py
deactivate

Name		Name	Last commit message	Last commit date
Latest commit History 62 Commits
static		static
templates		templates
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
analyze.py		analyze.py
app.py		app.py
book.py		book.py
genre_scraper.py		genre_scraper.py
main.css		main.css
naive_bayes.py		naive_bayes.py
process_data.py		process_data.py
process_frequencies.py		process_frequencies.py
punctuation_analysis.py		punctuation_analysis.py
rec_test.py		rec_test.py
remove_copyright.py		remove_copyright.py
requirements.txt		requirements.txt
test_bayes.py		test_bayes.py
unzip.py		unzip.py
urls		urls
vectors.py		vectors.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Wishful Thinking

Data obtention

Setup

About

Releases

Packages

Contributors 4

Languages

License

holdenrohrer/wishfulthinking

Folders and files

Latest commit

History

Repository files navigation

Wishful Thinking

Data obtention

Setup

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 4

Languages

Packages