This repository contains the code and data to test pre-trained language models for their common sense on Allen's interval algebra.
The associated research report can be found in ./report/report.pdf
. The dataset can be found in ./data/claude_examples.json
.
--model_id
: Set to the huggingface model id of the desired model. Tested examples:meta-llama/Meta-Llama-3.1-8B
roberta-base
gpt2
--lm_mode
: Make sure to set to the correct kind of language modeling according to the model specified with--model_id
. Can be one of the following:causal
masked
--quantization
: Add this flag to enable 4-bit quantization--normalize
: Add this flag to normalize the metric scores by subtracting the generic scores of the verbalizations. This aims to cancel out the effect that verbalizations that have an inherently low metric score will drag down the verbalized scores as well
This repository does not use docker compose
since it it's not available on Google Cloud, which is what I was using to run this. Therefore, we run it using a docker run
command that's saved in ./start_script.sh
- Build the image
$ docker build -t tcs .
- Make the run script executable
$ chmod u+x ./start_script.sh
- Edit
./start_script.sh
- Change the
--model_id
to the desired model to test - Make sure to set
--lm_mode
to the correct kind of language modeling. Can becausal
ormasked
. - Add the
--quantization
flag to enable 4-bit quantization
- Optional: Create an
.env
file and add the following line with your huggingface token. This is needed for gated models such as the Llama model family
HF_TOKEN=<your_token>
- Docker run
$ ./start_script.sh
- Install dependencies
$ pip install -r requirements.txt
- Optional: Create an
.env
file and add the following line with your huggingface token. This is needed for gated models such as the Llama model family
HF_TOKEN=<your_token>
- Run the script
$ python measure_common_sense.py --lm_mode causal --model_id meta-llama/Meta-Llama-3.1-8B
- Or change the parameters as described above
- Plot the confusion matrix heatmaps
$ python plot_confusion_matrices.py path/to/<confusion_matrix_name>.json
- PDF plot will be saved to
confusion_matrices/plots/<confusion_matrix_name>.pdf
- PDF plot will be saved to
- Calculate the correlation coefficients between graph hops and perplexity
$ python correlation.py path/to/<confusion_matrix_name>.json
--deltas
: Use perplexity deltas instead of absolute perplexity values