pip install -r requirements.txtPlease note that requiurements.txt may be incomplete. Install missing packages manually in need.
All LLMs are implementing abstract class llm.LLM. We implemented GLM-4, Llama 3.1 and Qwen 2.5 in folder llm. You can write your own LLM class besides them.
python template.py \
--sql-json [Path to Spider's train dataset json] \
--table-json [Path to Spider's table json] \
--output [Where to output templates (single pkl file)] \
--limit [How many templates should be reserved]python generate.py \
--table-json [Path to Spider's table json] \
--db-names [DB names used, sep by space] \
--db-dir [Path to the directory containing databases] \
--templates [Templates pkl file] \
--output [Where to write] \
--maximum-sqls-per-template 256 \
--template-limit 256 \You can change the LLM used by modifying the LLM-calling part in narrate.py.
python narrate.py \
--input [Synthed SQLs] \
--table [Path to Spider's table json] \
--base-model [Model path] \
--output-result [Where to write narrated pairs] \python build_pool.py \
--bge-model [BGE model path] \
--result-files [Path to narrated pairs] \
--output-dataset-prefix [Output prefix of pools] \You can change the LLM used by modifying the LLM-calling part in infer.py, and prompts can be replaced by choosing another Jinja2 template.
python infer.py \
--model-type llama \
--test-set [Path to Spider's dev/test set] \
--table [Path to Spider's table json] \
--reference-datasets-prefix [Prefix of pools] \
--reference-shot [Shot, for zero-shot use 0] \
--base-model [LLM path] \
--bge-model [BGE model path] \
--output-result [Where to write result] \python validate.py --result-file [Result] --db-dir [Path to database dir] [--start 0 --end 200]MIT
test-suite-sql-eval is licensed under Apache