Name		Name	Last commit message	Last commit date
parent directory ..
scenario0		scenario0
scenario1		scenario1
scenario2		scenario2
scenario3		scenario3
README.md		README.md

README.md

Diacritics dataset

Quantities

Intent	S0	S1	S2	S3.1	S3.2	S3.3
aprindeLumina	90/40	90/40	90/40	9/40	90/40	90/40
stingeLumina	90/40	90/40	90/40	9/40	90/40	90/40
cresteIntensitateLumina	90/40	90/40	90/40	9/40	90/40	90/40
scadeIntensitateLumina	90/40	90/40	90/40	9/40	90/40	90/40
cresteTemperatura	90/40	90/40	90/40	90/40	9/40	90/40
scadeTemperatura	90/40	90/40	90/40	90/40	9/40	90/40
seteazaTemperatura	90/40	90/40	90/40	90/40	9/40	90/40
puneMuzica	90/40	90/40	90/40	90/40	90/40	9/40
opresteMuzica	90/40	90/40	-	90/40	90/40	9/40
cresteIntensitateMuzica	90/40	90/40	90/40	90/40	90/40	9/40
scadeIntensitateMuzica	90/40	90/40	90/40	90/40	90/40	9/40
opresteTV	90/40	90/40	-	90/40	90/40	9/40
pornesteTV	90/40	90/40	90/40	90/40	90/40	9/40
schimbaCanalTV	90/40	90/40	90/40	90/40	90/40	9/40

Generating datasets with Chatito

The way Chatito works is that it expects the number of training and testing examples for a dataset to both be higher than 0. Because we have separate scripts for training and testing, we must be careful when generating the data so that the examples generated by the train script do not mix with those generated by the test script.

Steps for generating data:

These steps apply to all scenarios except the baseline (S0 - in this case we only have one script, because both train and test sets are generated from the same vocabulary)

Load all train scripts (one for each intent) into Chatito. There should be 1 example set for the test set in each script.
Generate datasets in Rasa NLU format.
Two JSON files will be generated: training_dataset and testing_dataset.
Keep training_dataset as the training data for that scenario. Testing_dataset can be thrown away.
Remove all scripts from Chatito. This step is very important to ensure that the examples from the train and test vocabulary don't mix.
Load all test scripts (one for each intent). There should be 1 example set for the train set in each script.
Generate datasets in Rasa NLU format.
Just as before, two JSON files will be generated.
This time keep testing_dataset as the test data for the scenario. Throw away training_dataset.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

diacritics

diacritics

README.md

Diacritics dataset

Quantities

Generating datasets with Chatito

Steps for generating data:

Files

diacritics

Directory actions

More options

Directory actions

More options

Latest commit

History

diacritics

Folders and files

parent directory

README.md

Diacritics dataset

Quantities

Generating datasets with Chatito

Steps for generating data: