Classifying Concord: Machine Learning Meets Transcendentalism

This project demonstrates how the use of machine learning to distinguish between the writing styles of Ralph Waldo Emerson and Henry David Thoreau—two central figures of American Transcendentalism. It show how to use NLP techniques to construct a novel dataset from their public domain works and provides a practical example of ML classification techniques.

Project Overview

The dataset consists of passages (3-5 sentences each) which were extracted from Emerson's Essays: First Series and Thoreau's Walden, and On The Duty Of Civil Disobedience, resulting in 1,911 labeled text segments.

Each passage was classified to assign an author using a range of machine learning models, from traditional algorithms to modern transformer-based approaches.

Methods

Preprocessing: Texts were cleaned, segmented, and lemmatized using spaCy. Stopwords and boilerplate were removed.
Feature Engineering: Both TF-IDF vectorization and transformer-based embeddings (DistilBERT) were used to represent text.
Models Compared:
- Logistic Regression
- Random Forest
- Support Vector Machine (SVM)
- DistilBERT
  - For feature extraction for other models.
  - Directly as a classifier after fine-tuning.

Results

Traditional ML (TF-IDF + classifiers): 83–86% accuracy
DistilBERT features + classifiers: 89–90% accuracy
Fine-tuned DistilBERT: 92% accuracy (best)

Analysis showed that misclassifications often involved boilerplate, short segments, or philosophical passages where both authors' styles converged. Thoreau's concrete nature descriptions were rarely confused for Emerson's more abstract prose.

Why This Matters

This project bridges machine learning and literary analysis, providing a reproducible case study in authorship attribution and stylistic quantification. It demonstrates the progression from classic ML to state-of-the-art NLP, and offers insights for both technical and humanities audiences.

Getting Started

Clone the repository:

git clone https://github.com/ranton256/classifying_concord.git
cd classifying_concord

Install dependencies:
```
pip install -r requirements.txt
```
Download spaCy English model:
```
python -m spacy download en_core_web_sm
```

Run the Jupyter notebook:

jupyter notebook supervised_ML_identify_author.ipynb

For more details, see the full article in classifying_concord.md and the code in this repository.

Name		Name	Last commit message	Last commit date
Latest commit History 32 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
classifying_concord.html		classifying_concord.html
classifying_concord.md		classifying_concord.md
classifying_concord.pdf		classifying_concord.pdf
classifying_condord.pdf		classifying_condord.pdf
density_plots.png		density_plots.png
emerson_word_cloud.png		emerson_word_cloud.png
fine_tuned_dbert_confusion.png		fine_tuned_dbert_confusion.png
fix_gh_nb_render.sh		fix_gh_nb_render.sh
image-20250611165131238.png		image-20250611165131238.png
image-20250611165131252.png		image-20250611165131252.png
logistic_regression_confusion.png		logistic_regression_confusion.png
lr_dbert_confusion.png		lr_dbert_confusion.png
main.tex		main.tex
random_forest_confusion.png		random_forest_confusion.png
requirements.txt		requirements.txt
rf_dbert_confusion.png		rf_dbert_confusion.png
supervised_ML_identify_author.ipynb		supervised_ML_identify_author.ipynb
supervised_ML_identify_author.pdf		supervised_ML_identify_author.pdf
svm_confusion.png		svm_confusion.png
svm_dbert_confusion.png		svm_dbert_confusion.png
thoreau_word_cloud.png		thoreau_word_cloud.png

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Classifying Concord: Machine Learning Meets Transcendentalism

Project Overview

Methods

Results

Why This Matters

Getting Started

About

Uh oh!

Releases

Packages

Languages

License

ranton256/classifying_concord

Folders and files

Latest commit

History

Repository files navigation

Classifying Concord: Machine Learning Meets Transcendentalism

Project Overview

Methods

Results

Why This Matters

Getting Started

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages