Description: this repo is a text clustering project using the Presidential State of the Union addresses as the document corpus. The project utilizes the R language is R Markdown as a publish format. The project exists in 2 parallel iterations.
- Script: the R script version consists of 3 files:
preprocess.R
,eda.R
, andcluster.R
. They contain only the code necessary to replicate the subject of the analysis. - Publication: the R Markdown file
sotu_cluster.RMD
contains the entire end-to-end analysis. The code consists entirely of code also available among the script files in addition to the analysis write-up. The published version of the final product is publically available at: RPubs.
Environment Notes: Though built to be platform agnostic via R, this analysis was developed on R 3.2.1 ("World-Famous Astronaut"), RStudio 0.99.442, and Windows 7.