siamese-network/notes/data.md at master · vineetm/siamese-network · GitHub

We work with Ubuntu Dialogue Corpus v2

We use the latest repository (Last Commit: 18 Oct 2017)
We only tokenize the text.

Detailed Steps:

Clone repository

git clone git@github.com:rkadlec/ubuntu-ranking-dataset-creator.git

Install dependencies. We prefer creating a conda environment, but you can select your favorite method

conda create -n ubuntu python=2.7
pip install -r requirements.text
cd ubuntu-ranking-dataset-creator/src

Finally, generate data. We only tokenize the text
```
./generate.sh -t
```