Skip to content

Latest commit

 

History

History

diacritics

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 

Diacritics dataset

Quantities

Intent S0 S1 S2 S3.1 S3.2 S3.3
aprindeLumina 90/40 90/40 90/40 9/40 90/40 90/40
stingeLumina 90/40 90/40 90/40 9/40 90/40 90/40
cresteIntensitateLumina 90/40 90/40 90/40 9/40 90/40 90/40
scadeIntensitateLumina 90/40 90/40 90/40 9/40 90/40 90/40
cresteTemperatura 90/40 90/40 90/40 90/40 9/40 90/40
scadeTemperatura 90/40 90/40 90/40 90/40 9/40 90/40
seteazaTemperatura 90/40 90/40 90/40 90/40 9/40 90/40
puneMuzica 90/40 90/40 90/40 90/40 90/40 9/40
opresteMuzica 90/40 90/40 - 90/40 90/40 9/40
cresteIntensitateMuzica 90/40 90/40 90/40 90/40 90/40 9/40
scadeIntensitateMuzica 90/40 90/40 90/40 90/40 90/40 9/40
opresteTV 90/40 90/40 - 90/40 90/40 9/40
pornesteTV 90/40 90/40 90/40 90/40 90/40 9/40
schimbaCanalTV 90/40 90/40 90/40 90/40 90/40 9/40

Generating datasets with Chatito

The way Chatito works is that it expects the number of training and testing examples for a dataset to both be higher than 0. Because we have separate scripts for training and testing, we must be careful when generating the data so that the examples generated by the train script do not mix with those generated by the test script.

Steps for generating data:

These steps apply to all scenarios except the baseline (S0 - in this case we only have one script, because both train and test sets are generated from the same vocabulary)

  1. Load all train scripts (one for each intent) into Chatito. There should be 1 example set for the test set in each script.
  2. Generate datasets in Rasa NLU format.
  3. Two JSON files will be generated: training_dataset and testing_dataset.
  4. Keep training_dataset as the training data for that scenario. Testing_dataset can be thrown away.
  5. Remove all scripts from Chatito. This step is very important to ensure that the examples from the train and test vocabulary don't mix.
  6. Load all test scripts (one for each intent). There should be 1 example set for the train set in each script.
  7. Generate datasets in Rasa NLU format.
  8. Just as before, two JSON files will be generated.
  9. This time keep testing_dataset as the test data for the scenario. Throw away training_dataset.