layout | permalink |
---|---|
cosql |
cosql |
Nov 12, 2024: We have released Spider 2.0 full paper, data and code. Follow the guideline to submit your scores to the leaderboard!
Aug 28, 2024: The early access version of Spider 2.0 (a more realistic and challenging text-to-SQL task) is now available! We expect to release the whole dataset in 1-2 weeks. As this is a preliminary release, there may be errors. Your feedback would be invaluable in refining the dataset!
Related Works from XLANG Lab: Spider 2.0 Text-to-SQL ('24) Spider2-V ('24) OSWorld ('24) DS-1000 Challenge (ICML'23) Binder Framework (ICLR '23) UnifiedSKG Framework (EMNLP'22) Spider Chanllenge (EMNLP'18) SParC Chanllenge (ACL'19)
- 11/12/2024 We have released Spider 2.0 full paper, data and code. Follow the guideline to submit your scores to the leaderboard.
- 08/28/2024 The early access version of Spider 2.0 (a more realistic and challenging text-to-SQL task) is now available! As this is a preliminary release, there may be errors. Your feedback would be invaluable in refining the dataset!
- 07/15/2024 Spider 2.0-vision (Benchmarking Multimodal Agents on Automating Data Science and Engineering Workflows) is out! Spider 2.0-SQL (much more realistic and challenging than Spider 1.0!) will be released in August.
- 08/10/2023 Please check out XLANG Lab for Building LLM/VLM Agents!
- 11/20/2022 Please check out our recent work DS-1000: A Natural and Reliable Benchmark for Data Science Code Generation. Please check out examples, data, and code on the DS-1000 project site!!
- 10/18/2022 Please check out our recent work Binder: an easy but sota neural-symbolic built on GPT-3 Codex & SQL/Python interpreter. It injects GPT-3 Codex prompt API calls in programming languages! Please check out Binder demo, code, paper, and video on the Binder project site!!
- 02/15/2022 Please check out our recent work UnifiedSKG: Unifying and Multi-Tasking Structured Knowledge Grounding with Text-to-Text Language Models. We open-sourced simple but SOTA/strong models for 21 tasks including text-to-SQL! Please check out our code in the UnifiedSKG repo!!
- 11/15/2020 We will use Test Suite Accuracy as our official evaluation metric for Spider, SParC, and CoSQL. Please find the evaluation code from here.
- 01/16/2020 Added some supplemental files to the CoSQL dataset (No change to the data content for each task).
- 10/30/2019 CoSQL dataset is out, see you at EMNLP 2019, Hong Kong!
- the dialogue states are grounded in domain-independent SQL program instead of domain-specific slot-value pairs.
- because testing is done on unseen databases, success requires generalizing to new domains.
Compared to other semantic parsing/text-to-SQL tasks, CoSQL presents new challenges:
- user questions are not necessarily answerable.
- it involves system responses to clarify ambiguous questions, verify returned results, and notify users of unanswerable or unrelated questions.
- each dialog is obtained via the Wizard-of-Oz setting between a crowd worker and a SQL expert.
CoSQL includes three tasks:
- SQL-grounded dialogue state tracking to map user utterances into SQL queries if possible given the interaction history
- natural language response generation based on an executed SQL and its results for user verification
- user dialogue act prediction to detect and resolve ambiguous and unanswerable questions
INFORM_SQL
label given the interaction context and the DB schema. Comparing to other context-dependent text-to-SQL tasks such as SParC, the DST task in CoSQL also includes the ambiguous questions if the user affirms the system clarification of them. In this case, the system clarification is also given as part of the interaction context to predict the SQL query corresponding to the question. As in Spider and SParC tasks, we report results of Exact Set Match without Values:
Rank | Model | Question Match | Interaction Match |
---|---|---|---|
1 Feb 14, 2022 |
STAR
Alibaba DAMO & SIAT (Cai and Li et al., EMNLP-Findings '22) code demo |
57.8 | 28.2 |
2 Apr 2, 2022 |
CQR-SQL
Tencent Cloud Xiaowei (Xiao et al.,'22) |
58.3 | 27.4 |
3 Jun 4, 2022 |
RASAT + PICARD
SJTU LUMIA & Netmind.AI (Qi et al., EMNLP'22) code |
55.7 | 26.5 |
4 Dec 26, 2022 |
MT Training + N-best List Rerankers + PICARD
Alexa AI (Parthasarathi et al., ICASSP'23) |
55.8 | 24.8 |
5 Oct 5, 2021 |
HIE-SQL + GraPPa
Alibaba DAMO (Zheng et al. ACL-Findings '22) |
53.9 | 24.6 |
6 Jul 14, 2021 |
T5-3B+PICARD
Element AI, a ServiceNow company (Scholak et al., EMNLP'21) code |
54.6 | 23.7 |
7 Jan 7, 2022 |
RATSQL++ + ELECTRA
Anonymous |
53.8 | 22.1 |
8 Sep. 21, 2020 |
RAT-SQL + SCoRe
Yale & Microsoft Research & PSU (Yu et al. ICLR '21) |
51.6 | 21.2 |
9 Aug 24, 2020 |
R²SQL + BERT
Alibaba DAMO (Hui et al. AAAI '21) code |
46.8 | 17.0 |
10 Jan 26, 2021 |
IST-SQL + BERT
University of Science and Technology of China (Wang et al. AAAI '21) code |
41.8 | 15.2 |
11 Nov. 16, 2020 |
WaveSQL + BERT
Anonymous |
46.1 | 15.1 |
12 May 26, 2020 |
IGSQL + BERT
Peking University (Cai et al. EMNLP '20) code |
42.5 | 15.0 |
13 Aug 30, 2019 |
EditSQL + BERT
Yale University & Salesforce Research (Zhang et al. EMNLP '19) code |
40.8 | 13.7 |
14 May 21, 2020 |
GAZP + BERT
University of Washington & Facebook AI Research (Zhong et al., EMNLP '20) |
39.7 | 12.8 |
15 Apr 21, 2021 |
MemCE
UoE (Jain et al., TACL '21) |
28.4 | 6.2 |
16 Aug 30, 2019 |
CD-Seq2Seq
Yale University & Salesforce Research (Yu et al. EMNLP '19) code |
13.9 | 2.6 |
17 Aug 30, 2019 |
SyntaxSQL-con
Yale University (Yu et al. EMNLP '18) code |
14.1 | 2.2 |
Rank | Model | Question Match | Interaction Match |
---|---|---|---|
1 Jun 4, 2022 |
RASAT + PICARD
SJTU LUMIA & Netmind.AI (Qi et al., EMNLP'22) code |
66.3 | 37.4 |
2 May 21, 2020 |
GAZP + BERT
University of Washington & Facebook AI Research (Zhong et al., EMNLP '20) |
35.9 | 8.4 |
INFORM_SQL
. It considers a SQL query, the execution result, and the DB schema. Preserving logical consistency (Logic Correctness Rate (LCR)) between SQL and NL response is crucial in this task, in addition to naturalness and syntactical correctness.
Rank | Model | BLEU | Grammar | LCR (%) |
---|---|---|---|---|
1 Dec 15, 2022 |
Complexity Aware Prompts + T5
Alexa AI |
28.1 | - | - |
2 Aug 30, 2019 |
Template baseline | 9.3 | 4.0 | 41.0 |
3 Aug 30, 2019 |
Pointer-generator baseline | 15.1 | 3.6 | 35.0 |
4 Aug 30, 2019 |
Seq2Seq baseline | 14.1 | 3.5 | 27.0 |
INFORM_SQL
.
Rank | Model | Accuracy |
---|---|---|
1 Dec 20, 2019 |
UTran-SQL | 87.2 |
2 Aug 30, 2019 |
TBCNN-pair baseline | 83.9 |
3 Aug 30, 2019 |
Majority baseline | 62.8 |