Skip to content

Commit

Permalink
Tweaked the data/README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
Dean Wampler committed Feb 5, 2015
1 parent 35d056b commit 16b64b6
Showing 1 changed file with 4 additions and 1 deletion.
5 changes: 4 additions & 1 deletion data/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,6 @@ File | Description
`ugntdat.txt` | The Greek New Testament.
`apodat.txt` | The Apocrypha (in English).
`abbrevs-to-names.tsv` | A map from the book abbreviations used in these texts to the full book names. Derived using data from the sacred-texts.com site.
`gallic-mb-txt` | An English translation of Julius Caesar's famous memoir, _Gallic Wars_.

There are many other texts from the world's religious traditions at the [www.sacred-texts.com](http://www.sacred-texts.com) site, but most of the others aren't formatted into one convenient file like these examples.

Expand Down Expand Up @@ -55,6 +54,10 @@ LOCATION 'hdfs://server/user/<USER>/data/abbrevs_to_names';

Note that the field delimiter is tab, not "|".

## Julius Caesar's "Gallic Wars"

`gallic-mb-txt` is an English translation of Julius Caesar's famous memoir, _Gallic Wars_ about his conquest of Gaul (roughly modern France, the French part of Switzerland, and parts of Germany).

## Email Classified as SPAM and HAM

A sample of SPAM/HAM classified emails from the well-known Enron email data set was adapted from [this research project](http://www.aueb.gr/users/ion/data/enron-spam/). Each file is plain text, partially formatted (i.e., with `name:value` headers) as used in email servers and clients.
Expand Down

0 comments on commit 16b64b6

Please sign in to comment.