Python
code for the paper
Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis by Albert Alcalde, Giovanni Fantuzzi, and Enrique Zuazua.
Main files: The two main files are:
imdb_classifier.ipynb
, where we conduct the numerical experiments for the sentiment analysis of IMDb movie reviews. We read and batch the preprocessed data, define the hyperparameters, train the model and plot the tokens of reviews evolving through the layers of the transformer.dynamics.ipynb
, where we simulate the dynamics of our transformer by trying different initial configurations. Used to generate Figure 2.
data folder: contains the original reviews from the IMDb dataset (IMDB Dataset.csv
), and the preprocessed reviews (imdb_preprocessed.csv
), which are generated with preprocess.py
.
plots folder: contains the plots generated by the different files of the repository.
saved folder: contains the weights and parameters of the trained models, as well as the history of the training curves (loss and accuracy).
utils folder: contains the definition of the functions needed in the main files:
graphics.py
, includes the functions needed for plotting the results.models.py
, includes the definition of our transformer-based classifier, the encoder, hardmax dynamics and decoder.training.py
, includes the functions needed for the training process.
@article{alcalde2024clustering,
title={Clustering in Pure-Attention Hardmax Transformers and its Role in Sentiment Analysis},
author={Albert Alcalde and Giovanni Fantuzzi and Enrique Zuazua},
year={2024},
eprint={2407.01602},
archivePrefix={arXiv},
primaryClass={cs.CL}
}