Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Review: corpora without dates #15

Closed
kgjerde opened this issue May 6, 2019 · 0 comments
Closed

Review: corpora without dates #15

kgjerde opened this issue May 6, 2019 · 0 comments

Comments

@kgjerde
Copy link
Owner

kgjerde commented May 6, 2019

A statement of need: Do the authors clearly state what problems the software is designed to solve and who the target audience is?

Yes although in its current form, it's very oriented toward documents that span dates, yet the statement of need in the paper speaks to digital humanities and other fields where this may not be the case. See my comments on this below.

Rather than require a date to be attached to each document, I think it would be better to replace this with an optional sequence variable, and to assign one in the document order if none is given. If this is a date, great, and the package can use dates as is. Otherwise the sequence items would simply be serial numbers.

Example usage: Do the authors include examples of how to use the software (ideally to solve real-world analysis problems).

The Russia example is shown nicely in the GitHub README. But I think that another example that could excite digital humanities scholars would be to apply it to any corpus of documents chapters of a novel, such as Moby Dick as it is analyzed in Jockers, M. L. (2014). Text analysis with R for students of literature. New York: Springer. (We replicate this for quanteda here.) I think that there are far more corpora that lack dates than that have them, so generalizing this and demonstrating it as an example would greatly broaden the user base of the package. Demonstrating the package on Moby Dick would be a great application and it's easy to access that dataset online or bundle it with the package. (You would need to segment it by chapter first but this is not difficult.)

I have now included the possibility to explore corpora without dates (a number of commits over the last week or so), including changes in prepare_data (including a new grouping_variable parameter for non-date-corpora) and README.

I have also added two such example cases: the Bible and Jane Austen books, and linked to them in the README. (I agree that Moby Dick would be nice, but I hope those two cases also provide a good demonstration.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant