Skip to content

π”Έπ•„π”Ήβ„π•†π•Šπ•€π”Έ: A Benchmark for Parsing Ambiguous Questions into Database Queries

Notifications You must be signed in to change notification settings

saparina/ambrosia

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

4 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

AMBROSIA

AMBROSIA is a new benchmark for recognizing and interpreting ambiguous requests in text-to-SQL. The dataset contains questions with three types of ambiguity (scope ambiguity, attachment ambiguity, and vagueness), their interpretations, and corresponding SQL queries. AMBROSIA and all released software are distributed under CC BY 4.0.

This repository contains data and code used for data collection and evaluation.

image

Data

The dataset is available at ambrosia-benchmark.github.io. Please download and extract it into the data directory.

Setup

To speed up inference, we use various toolkits: TGI, VLLM, and OpenChat server. Dockerfiles for TGI, OpenChat and VLLM inference are available in the docker directory. The Docker image starts the server with the model pre-loaded and has all necessary data and code.

Pull and run the customized TGI image (for Llama3 and CodeLlama Prompt):

docker pull irisaparina/ambrosia-eval-tgi
docker run -it ambrosia-docker-image-tgi --gpus all \
    --shm-size 1g \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    --model-id $model \
    --max-best-of '1' \
    /bin/bash

Pull and run the customized VLLM image (for Llama3, CodeLlama and OpenChat Beam):

docker pull irisaparina/ambrosia-eval-vllm
docker run -it ambrosia-docker-image-vllm --runtime nvidia --gpus all \
    --shm-size 1g \
    --env "HUGGING_FACE_HUB_TOKEN=<secret>" \
    --model $model \
    --max-model-len 5120 \
    --seed 1 \
    /bin/bash

Pull and run the customized OpenChat image (for OpenChat Prompt and database generation):

docker pull irisaparina/ambrosia-eval-openchat
docker run -it ambrosia-docker-image-openchat --gpus all /bin/bash

Evaluation

All evaluation functions are located in the src/evaluation directory. Prompts for evaluations can be found in src/prompts/evaluation. Scripts for the experiments are provided in src/scripts. For example, to perform zero-shot inference of Llama3-70B with the Prompt method (requires TGI):

./src/scripts/zero_shot_prompt.sh "meta-llama/Meta-Llama-3-70B-Instruct" --tgi

You can also run inference directly (with a fixed random seed and only ambiguous questions):

python src/evaluation/evaluate_model_tgi.py \
    --prompt_file src/prompts/evaluation/prompt \
    --use_tgi \
    --api_url 'http://0.0.0.0/' \
    --model_name "meta-llama/Meta-Llama-3-70B-Instruct" \
    --type_of_questions ambig

For example, to perform zero-shot inference of Llama3-70B with the Beam method (requires VLLM):

./src/scripts/zero_shot_beam_vllm_server.sh "meta-llama/Meta-Llama-3-70B-Instruct"

Or to run inference directly (with a fixed random seed and only ambiguous questions):

python3 src/evaluation/evaluate_model_openai_api.py  \
    --prompt_file src/prompts/evaluation/beam \
    --use_vllm \
    --api_url 'http://localhost:8000/v1' \
    --api_key 'EMPTY' \
    --model_name "meta-llama/Meta-Llama-3-70B-Instruct" \
    --type_of_questions ambig

Database Generation

All evaluation functions are located in the src/db_generation directory. Prompts for evaluations can be found in src/prompts/db_generation. Domains are specified in the data directory. We use the OpenChat model for database generation.

Once you run a Docker container with OpenChat, you can run the generation of key concepts and relations:

./src/scripts/generate_key_concepts_relations.sh

It will run the following command for all ambiguity types:

python src/db_generation/generate_key_concepts_relations.py --prompt_file PROMPT --ambig_type AMBIG_TYPE --data_dir DIR_FOR_CONCEPTS --domain_file DOMAIN_FILE --api_url "http://localhost:18888/v1/"

Run database generation:

./src/scripts/generate_databases.sh

It will run the following command for all ambiguity types:

python src/db_generation/generate_databases.py --data_dir data/ --ambig_type AMBIG_TYPE --api_url "http://localhost:18888/v1/"

Data Collection Interface

You can find the Potato interface for data collection in the annotation directory. To install Potato, follow the instructions in the README. We use the modified version written by Hosking et al., 2024.

To start Potato, run the following command:

cd annotation/
python potato/flask_server.py start ambrosia_data_collection/configs/CONFIG_FILE -p 8000

The config options are:

We provide examples of raw data in annotation/ambrosia_data_collection/data_files. Instructions for annotators can be found in annotation/ambrosia_data_collection/instructions

Paper

More details on data collection and evaluation results are provided in the paper:

π”Έπ•„π”Ήβ„π•†π•Šπ•€π”Έ: A Benchmark for Parsing Ambiguous Questions into Database Queries

Irina Saparina and Mirella Lapata

About

π”Έπ•„π”Ήβ„π•†π•Šπ•€π”Έ: A Benchmark for Parsing Ambiguous Questions into Database Queries

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages