From 16b64b68c516d8783f36eaa0de7cde636383abc0 Mon Sep 17 00:00:00 2001 From: Dean Wampler Date: Wed, 4 Feb 2015 20:42:43 -0600 Subject: [PATCH] Tweaked the data/README.md --- data/README.md | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-) diff --git a/data/README.md b/data/README.md index e486763..14a6e7d 100644 --- a/data/README.md +++ b/data/README.md @@ -23,7 +23,6 @@ File | Description `ugntdat.txt` | The Greek New Testament. `apodat.txt` | The Apocrypha (in English). `abbrevs-to-names.tsv` | A map from the book abbreviations used in these texts to the full book names. Derived using data from the sacred-texts.com site. -`gallic-mb-txt` | An English translation of Julius Caesar's famous memoir, _Gallic Wars_. There are many other texts from the world's religious traditions at the [www.sacred-texts.com](http://www.sacred-texts.com) site, but most of the others aren't formatted into one convenient file like these examples. @@ -55,6 +54,10 @@ LOCATION 'hdfs://server/user//data/abbrevs_to_names'; Note that the field delimiter is tab, not "|". +## Julius Caesar's "Gallic Wars" + +`gallic-mb-txt` is an English translation of Julius Caesar's famous memoir, _Gallic Wars_ about his conquest of Gaul (roughly modern France, the French part of Switzerland, and parts of Germany). + ## Email Classified as SPAM and HAM A sample of SPAM/HAM classified emails from the well-known Enron email data set was adapted from [this research project](http://www.aueb.gr/users/ion/data/enron-spam/). Each file is plain text, partially formatted (i.e., with `name:value` headers) as used in email servers and clients.