AMBROSIA is a new benchmark for recognizing and interpreting ambiguous requests in text-to-SQL. The dataset contains questions with three types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. AMBROSIA and all released software are distributed under CC BY 4.0.
This repository contains data and code used for data collection and evaluation.
The dataset is available at ambrosia-benchmark.github.io. Please download and extract it into the data
directory.
To speed up inference, we use various toolkits: TGI, VLLM, and OpenChat server. Dockerfiles for TGI, OpenChat and VLLM inference are available in the docker directory. The Docker image starts the server with the model pre-loaded and has all necessary data and code.
Pull and run the customized TGI image (for Llama3 and CodeLlama Prompt):
docker pull irisaparina/ambrosia-eval-tgi
docker run -it ambrosia-docker-image-tgi --gpus all \
--shm-size 1g \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
--model-id $model \
--max-best-of '1' \
/bin/bash
Pull and run the customized VLLM image (for Llama3, CodeLlama and OpenChat Beam):
docker pull irisaparina/ambrosia-eval-vllm
docker run -it ambrosia-docker-image-vllm --runtime nvidia --gpus all \
--shm-size 1g \
--env "HUGGING_FACE_HUB_TOKEN=<secret>" \
--model $model \
--max-model-len 5120 \
--seed 1 \
/bin/bash
Pull and run the customized OpenChat image (for OpenChat Prompt and database generation):
docker pull irisaparina/ambrosia-eval-openchat
docker run -it ambrosia-docker-image-openchat --gpus all /bin/bash
All evaluation functions are located in the src/evaluation
directory. Prompts for evaluations can be found in src/prompts/evaluation
.
Scripts for the experiments are provided in src/scripts
. For example, to perform zero-shot inference of Llama3-70B with the Prompt method (requires TGI):
./src/scripts/zero_shot_prompt.sh "meta-llama/Meta-Llama-3-70B-Instruct" --tgi
You can also run inference directly (with a fixed random seed and only ambiguous questions):
python src/evaluation/evaluate_model_tgi.py \
--prompt_file src/prompts/evaluation/prompt \
--use_tgi \
--api_url 'http://0.0.0.0/' \
--model_name "meta-llama/Meta-Llama-3-70B-Instruct" \
--type_of_questions ambig
For example, to perform zero-shot inference of Llama3-70B with the Beam method (requires VLLM):
./src/scripts/zero_shot_beam_vllm_server.sh "meta-llama/Meta-Llama-3-70B-Instruct"
Or to run inference directly (with a fixed random seed and only ambiguous questions):
python3 src/evaluation/evaluate_model_openai_api.py \
--prompt_file src/prompts/evaluation/beam \
--use_vllm \
--api_url 'http://localhost:8000/v1' \
--api_key 'EMPTY' \
--model_name "meta-llama/Meta-Llama-3-70B-Instruct" \
--type_of_questions ambig
All evaluation functions are located in the src/db_generation
directory. Prompts for evaluations can be found in src/prompts/db_generation
. Domains are specified in the data directory. We use the OpenChat model for database generation.
Once you run a Docker container with OpenChat, you can run the generation of key concepts and relations:
./src/scripts/generate_key_concepts_relations.sh
It will run the following command for all ambiguity types:
python src/db_generation/generate_key_concepts_relations.py --prompt_file PROMPT --ambig_type AMBIG_TYPE --data_dir DIR_FOR_CONCEPTS --domain_file DOMAIN_FILE --api_url "http://localhost:18888/v1/"
Run database generation:
./src/scripts/generate_databases.sh
It will run the following command for all ambiguity types:
python src/db_generation/generate_databases.py --data_dir data/ --ambig_type AMBIG_TYPE --api_url "http://localhost:18888/v1/"
You can find the Potato interface for data collection in the annotation
directory. To install Potato, follow the instructions in the README. We use the modified version written by Hosking et al., 2024.
To start Potato, run the following command:
cd annotation/
python potato/flask_server.py start ambrosia_data_collection/configs/CONFIG_FILE -p 8000
The config options are:
- scope_questions_interpretations.yaml: annotation of questions with scope ambiguity and their interpretations;
- attachment_questions_interpretations.yaml: annotation of questions with attachment ambiguity and their interpretations;
- vague_questions_sql.yaml: annotation of vague questions and SQL queries (provided for databases with general concepts, see paper for more details);
- vague_interpretations.yaml: annotation of interpretations for vague questions;
- database_review.yaml: validation of generated concepts and databases.
We provide examples of raw data in annotation/ambrosia_data_collection/data_files
.
Instructions for annotators can be found in annotation/ambrosia_data_collection/instructions
More details on data collection and evaluation results are provided in the paper:
πΈππΉβππππΈ: A Benchmark for Parsing Ambiguous Questions into Database Queries
Irina Saparina and Mirella Lapata