Skip to content

allenai/multidoc_qa_eval

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

multidoc_qa_eval

Update

Aug 16 2024: Incorporated the snippets/quotes from the annotations along with the rubrics for fine-grained eval. Changed parameter --rubrics to --test-config in scripts/llm_eval.py. Also, new test_configs_snippets.json and output_snippets.jsonl files.

Setup

conda create -n qa_eval python=3.10.0
conda activate qa_eval
pip install -r requirements.txt

Data Files

  • output_snippets.jsonl : Contains the questions with system responses for eval (and some other metadata used to generate the test cases but not required for subsequent runs). Should not require further modification.
  • test_configs_snippets.json : A collection of test cases in json format with associated rubrics for each question. Each question has its own test case and rubrics. Should not require further modification.
  • qa_metadata_all.jsonl : Metadata file that was used to bootstrap this utility. Should not require further modification.
  • src_answers: Directory containing sample system responses from 4 systems.

Eval

To run eval for your system, first setup the prediction file with the system answers to be evaluated as per following requirement:

A jsonl file with fields case_id and answer_text (See example file) -

  • case_id corresponds to the identifier of the question for which the response is to be evaluated, map the question text with the case_id in test_configs_snippets.json
  • answer_text is the system answer (along with citations and excerpts, if applicable) in plain text

Once the prediction json file is ready, save it a new directory run the eval script as follows (You can save as many system response files under a directory, they will be picked together for eval):

export OPENAI_API_KEY=<openai key>
python scripts/llm_eval.py --qa-dir <prediction jsonl file directory> --test-config data/test_configs_snippets.json --rubrics --snippets --src-names <optional comma separated src names prefixes of prediction files with .jsonl, if not given all the files will be picked>

Note To evaluate only using rubrics, remove --snippets parameter and vice-versa to use only snippets.

License

The aggregate test cases, sample system answers under data/src_answers and other files under data directory are released under ODC-BY license. By downloading this data you acknowledge that you have read and agreed to all the terms in this license. For constituent datasets, also go through the individual licensing requirements, as applicable.

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published