Skip to content

Latest commit

 

History

History
54 lines (35 loc) · 2.29 KB

README.md

File metadata and controls

54 lines (35 loc) · 2.29 KB

RYANSQL

Introduction

A source code for RYANSQL, a text-to-SQL system for complex, cross-domain databases.

Reference Paper: Choi et al., RYANSQL: Recursively Applying Sketch-based Slot Fillings for Complex Text-to-SQL in Cross-Domain Databases, 2020

The system is submitted to SPIDER leaderboard. The system and its minor improved version RYANSQL v2 is ranked at second and fourth place (as of February 2020).

The system does NOT use any database records, which make it more acceptable to the real world company applications.

Requirements

Python3
Tensorflow 1.14
nltk

Install

Download the BERT pretrained model. You can only download the model, not the whole git. The system uses BERT-large, uncased with Whole Word Masking model. Unzip the downloaded file.

Download the SPIDER dataset from https://yale-lily.github.io/spider. Unzip the downloaded file.

Train

Run:

python src/trainer.py [BERT_DIR] [SPIDER_DATASET_DIR]

An example is:

python src/trainer.py ./wwm_uncased_L-24_H-1024_A-16 ./spider

The training takes about a day using a single Tesla V100 GPU. The dev set performance during the training shows the exact slot matching performance, including ordering; it will range between 55 to 57 % for the final model.

The required files of the SPIDER dataset are: tables.json, train_spider.json, train_others.json, plus dev.json for testing.

Evaluate

Clone the Spider git (https://github.com/taoyds/spider), and add its local directory to python sys.path.

Run:

python src/actual_test.py [MODEL_PATH] [BERT_DIR] [SPIDER_DATASET_DIR] [OUT_FILE]

to get the resultant SQL statements for the development set. The generated output file then could be evaluated using the SPIDER's evaluation script.

The performance of evaluation script with the final model will range from 64 to 66 %, since the ordering of conditions is not important for an actual SQL statement.

The required files for SPIDER dataset is, table.json for database schema information, and dev.json for development dataset.

Contact

nlp.en@kakaoenterprise.com