Temporal Common Sense Testing of Pre-Trained Language Models

This repository contains the code and data to test pre-trained language models for their common sense on Allen's interval algebra. The associated research report can be found in ./report/report.pdf. The dataset can be found in ./data/claude_examples.json.

Options

--model_id: Set to the huggingface model id of the desired model. Tested examples:
- meta-llama/Meta-Llama-3.1-8B
- roberta-base
- gpt2
--lm_mode: Make sure to set to the correct kind of language modeling according to the model specified with --model_id. Can be one of the following:
- causal
- masked
--quantization: Add this flag to enable 4-bit quantization
--normalize: Add this flag to normalize the metric scores by subtracting the generic scores of the verbalizations. This aims to cancel out the effect that verbalizations that have an inherently low metric score will drag down the verbalized scores as well

Usage

Docker

This repository does not use docker compose since it it's not available on Google Cloud, which is what I was using to run this. Therefore, we run it using a docker run command that's saved in ./start_script.sh

Build the image

$ docker build -t tcs .

Make the run script executable

$ chmod u+x ./start_script.sh

Edit ./start_script.sh

Change the --model_id to the desired model to test
Make sure to set --lm_mode to the correct kind of language modeling. Can be causal or masked.
Add the --quantization flag to enable 4-bit quantization

Optional: Create an .env file and add the following line with your huggingface token. This is needed for gated models such as the Llama model family

HF_TOKEN=<your_token>

Docker run

$ ./start_script.sh

Manually

Install dependencies

$ pip install -r requirements.txt

Optional: Create an .env file and add the following line with your huggingface token. This is needed for gated models such as the Llama model family

HF_TOKEN=<your_token>

Run the script

$ python measure_common_sense.py --lm_mode causal --model_id meta-llama/Meta-Llama-3.1-8B

Or change the parameters as described above

Evaluation

Plot the confusion matrix heatmaps
```
$ python plot_confusion_matrices.py path/to/<confusion_matrix_name>.json
```
- PDF plot will be saved to confusion_matrices/plots/<confusion_matrix_name>.pdf
Calculate the correlation coefficients between graph hops and perplexity
```
$ python correlation.py path/to/<confusion_matrix_name>.json
```
- --deltas: Use perplexity deltas instead of absolute perplexity values

Name		Name	Last commit message	Last commit date
Latest commit History 24 Commits
confusion_matrices		confusion_matrices
data		data
plots		plots
report		report
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
correlation.py		correlation.py
measure_common_sense.py		measure_common_sense.py
old_output.txt		old_output.txt
output.txt		output.txt
plot_confusion_matrices.py		plot_confusion_matrices.py
requirements.txt		requirements.txt
start_script.sh		start_script.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Temporal Common Sense Testing of Pre-Trained Language Models

Options

Usage

Docker

Manually

Evaluation

About

Releases

Packages

Languages

License

Tai-Mai/temporal-common-sense

Folders and files

Latest commit

History

Repository files navigation

Temporal Common Sense Testing of Pre-Trained Language Models

Options

Usage

Docker

Manually

Evaluation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages