- This project aims at investigating various Natural Language Processing techniques for the task of Disfluency Detection.
- Our approach involves attempting to look at it from the lens of a modified Named Entity Recognition problem, and involved the utilization of finetuned BERT as well as Bi-LSTM based Neural Networks to achieve the same.
- The experiments have been performed on modified versions of the DisflQA corpus and Switchboard Corpus, annotated as per requirement.
Disfl-QA dataset obtained from: Gupta, A., Xu, J., Upadhyay, S., Yang, D., & Faruqui, M. (2021). Disfl-QA: A Benchmark Dataset for Understanding Disfluencies in Question Answering. Findings of ACL. https://doi.org/10.18653/v1/2021.findings-acl.293 Link to github: https://github.com/google-research-datasets/Disfl-QA
Switchboard Corpus obtained from: Godfrey, John J., and Edward Holliman. Switchboard-1 Release 2 LDC97S62. Web Download. Philadelphia: Linguistic Data Consortium, 1993. The data section in the repository provides only sample data.