Review: corpora without dates #15

kgjerde · 2019-05-06T11:05:17Z

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

Yes although in its current form, it's very oriented toward documents that span dates, yet the statement of need in the paper speaks to digital humanities and other fields where this may not be the case. See my comments on this below.

Rather than require a date to be attached to each document, I think it would be better to replace this with an optional sequence variable, and to assign one in the document order if none is given. If this is a date, great, and the package can use dates as is. Otherwise the sequence items would simply be serial numbers.

Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

The Russia example is shown nicely in the GitHub README. But I think that another example that could excite digital humanities scholars would be to apply it to any corpus of documents chapters of a novel, such as Moby Dick as it is analyzed in Jockers, M. L. (2014). Text analysis with R for students of literature. New York: Springer. (We replicate this for quanteda here.) I think that there are far more corpora that lack dates than that have them, so generalizing this and demonstrating it as an example would greatly broaden the user base of the package. Demonstrating the package on Moby Dick would be a great application and it's easy to access that dataset online or bundle it with the package. (You would need to segment it by chapter first but this is not difficult.)

I have now included the possibility to explore corpora without dates (a number of commits over the last week or so), including changes in prepare_data (including a new grouping_variable parameter for non-date-corpora) and README.

I have also added two such example cases: the Bible and Jane Austen books, and linked to them in the README. (I agree that Moby Dick would be nice, but I hope those two cases also provide a good demonstration.)

kgjerde mentioned this issue May 6, 2019

[REVIEW]: corporaexplorer: an R package for dynamic exploration of text collections openjournals/joss-reviews#1342

Closed

36 tasks

kgjerde closed this as completed May 10, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Review: corpora without dates #15

Review: corpora without dates #15

kgjerde commented May 6, 2019 •

edited

Loading

Review: corpora without dates #15

Review: corpora without dates #15

Comments

kgjerde commented May 6, 2019 • edited Loading

kgjerde commented May 6, 2019 •

edited

Loading