GitHub - U-Alberta/QA-partial-marks: [ACL Findings 2024] Accurate and Nuanced Open-QA Evaluation Through Textual Entailment

Official implementation of "Accurate and Nuanced Open-QA Evaluation Through Textual Entailment"

If you find this code useful in your research, please consider citing:

@inproceedings{2024yao_qaeval,
    title = "Accurate and Nuanced Open-{QA} Evaluation Through Textual Entailment",
    author = "Peiran Yao  and
      Barbosa, Denilson",
    booktitle = "Findings of the Association for Computational Linguistics: ACL 2024",
    month = aug,
    year = "2024",
    address = "Bangkok, Thailand",
    url = "https://arxiv.org/abs/2405.16702",
}

Setting up

A valid OpenAI API key is required.

export OPENAI_API_KEY=your_openai_api_key
pip install openai backoff pandas tqdm
# If you are interested in finetuning
pip install torch transformers datasets peft trl

Running QA evaluation

Two steps are needed to evaluate the correctness of question-answer pairs:

# Convert QA pairs to statements
python3 qa2s-gpt.py --dataset [NQ|TQ]
# Run entailment test between system answer and reference answer
python3 nli-gpt.py --dataset [NQ|TQ]

The notebook eval-eval.ipynb is for evaluating the QA evaluation results.

Scripts for training baselines involving finetuning are in the trained-eval and rev folders.

Getting partial and bonus marks

Running python3 cot.py to use GPT-3.5 to verbalize the entailment reasoning. The notebook cot_vs_pr.ipynb computes a score from the verbalized reasoning and evaluates the score by AUROC.

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
data		data
rev		rev
trained-eval		trained-eval
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
baseline-scoring.py		baseline-scoring.py
cot.py		cot.py
cot_vs_pr.ipynb		cot_vs_pr.ipynb
eval-eval.ipynb		eval-eval.ipynb
llm_tools.py		llm_tools.py
nli-gpt.py		nli-gpt.py
poster.pdf		poster.pdf
qa2s-gpt.py		qa2s-gpt.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Setting up

Running QA evaluation

Getting partial and bonus marks

About

Languages

License

U-Alberta/QA-partial-marks

Folders and files

Latest commit

History

Repository files navigation

Setting up

Running QA evaluation

Getting partial and bonus marks

About

Resources

License

Stars

Watchers

Forks

Languages