Summarise output from The Content Mine's AMI plug-in into a format that can be displayed in ami-viz.
Before running this code, run the following Content Mine pipeline (see mine.sh
for example commands):
getpapers
to download articlesnorma
to create scholarly.html for all articlesami2-species
orami2-word
to generate the results.xml files
The data directory (stored in the data.dir
variable in main.r
) should contain one subfolder for each article, named after the article's ID (done automatically by getpapers
). After processing with AMI, these subfolders should contain a 'results' folder, which will in turn contain:
- species/binomial/results.xml
- word/frequencies/results.xml
This program will take these files, summarise them across all articles and output JSON files ready to be visualised in an interactive network graph (for example: https://github.com/matthewgthomas/ami-viz).
To use, run the code in main.r
.
The program will write three JSON files containing nodes and edge lists:
words.json
-- the top X most frequent words and the articles in which they appearwords_tdidf.json
-- same as above but calculated using term frequency-inverse document frequency (TF-IDF)species.json
-- occurrences of binomial species names and the articles in which they appear
- Add support for more AMI plugins
- Make faster -- code is slow when running over 100s of articles