For any questions, please feel free to reach out to !
Link: Google drive link
For what follows, we assume that the spider dataset is located under directory Dataset
(No need to run these code, processed data are already provided; if you are interested, you can reach out for more info)
Polly synthesizing: 'SpeakQL/spider_processing/' Amazon ASR transcribing: 'SpeakQL/Amazon_transcribe/AmazonTranscribe.ipynb'
Polly-synthesized spoken questions: "Dataset/spider/my/[train|dev]/speech_[mp3|wav]" Polly-synthesized DB schemas: "Dataset/spider/my/db/speech_[mp3|wav]" Amazon ASR transcription outputs: "Dataset/spider/my/[train|dev]/spider_[train|dev]_batch0"
Getting phonemes (+ timestamps): 'spider_processing/phoneme_align/phoneme-align.ipynb' Based on external tool Prosodylab-Aligner (given token & audio, aligns phonemes to timestamps)
- Training: 'SpeakQL/Allennlp_models/bash_scripts/' + version
- Predicting: 'SpeakQL/Allennlp_models/bash_scripts/' + version
- Evaluating: 'SpeakQL/Allennlp_models/bash_scripts/' + version'
- Need our forked rat-sql:
- Our trained rat-sql checkpoints: Google drive link
For example (full DBATI):
cd SpeakQL/Allennlp_models
bash bash_scripts/ tagger.0t
bash bash_scripts/ ILM.0i
bash bash_scripts/ tagger.0t ILM.0i
bash bash_scripts/ tagger.0t-ILM.0i
You can also refer to
for an example to run a batch of experiments.