This folder provides the code used for extracting the information from the review table.
In order to run the clustering and extract information from the review table which is saved under data/in
you need to run the pre_processing.py
. While you run the code you can select which column to extract and where to save the information.
After the data have been processed you can also generate your own wordcloud depending on the column you are interested of. For doing so run the script create_wordcloud.py
.
To generate the bibtex
of the reviewed works you can use the script generate_citations.py
.
If you want to perform some clustering from the extracted keywords you can use the supporting jupyter notebooks which can be found under the folder FaissSearch
if you would like to perform searches and clusterings using faiss. Otherwise, you can use the taxonomySearch
in case you would like to cluster with scikit-learn. For the paper outcomes scikit-learn
has been used. Within this folder three clustering approaches are available, one with HDBSCAN, one with Kmeans and one for hierarcical clustering. Feel free to choose the one you like the most. Please remember that before applying the clustering you must run the pre_processing.py
.