Code for extraction of clusters and information

This folder provides the code used for extracting the information from the review table.

Preprocessing

In order to run the clustering and extract information from the review table which is saved under data/in you need to run the pre_processing.py. While you run the code you can select which column to extract and where to save the information.

Wordcloud generation

After the data have been processed you can also generate your own wordcloud depending on the column you are interested of. For doing so run the script create_wordcloud.py.

Citations generation

To generate the bibtex of the reviewed works you can use the script generate_citations.py.

Clustering

If you want to perform some clustering from the extracted keywords you can use the supporting jupyter notebooks which can be found under the folder FaissSearch if you would like to perform searches and clusterings using faiss. Otherwise, you can use the taxonomySearch in case you would like to cluster with scikit-learn. For the paper outcomes scikit-learn has been used. Within this folder three clustering approaches are available, one with HDBSCAN, one with Kmeans and one for hierarcical clustering. Feel free to choose the one you like the most. Please remember that before applying the clustering you must run the pre_processing.py.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

README.md

README.md

Code for extraction of clusters and information

Preprocessing

Wordcloud generation

Citations generation

Clustering

Files

README.md

Latest commit

History

README.md

File metadata and controls

Code for extraction of clusters and information

Preprocessing

Wordcloud generation

Citations generation

Clustering