A benchmark corpus of 100 English novels, covering the 19th and the beginning of the 20th century. It contains novels by 33 authors (1/3 female writers, 2/3 male writers), and one anonymous (well, not so much...) novel entitled "Clara Vaughan".
The corpus is aimed at stylometric benchmarks. See: https://computationalstylistics.github.io/resources/ for further details.
Additionally, the folder 'word_embedding_models' contains two vector representations of the benchmark novels. The two models were produced using the GloVe algorithm via the 'text2vec' library for R. The models include a 50-dimensional representation of words, as well as a 100-dimensional one.