Inconsistencies in intent classification #778

satnam2012 · 2019-04-04T20:48:16Z

Hi,

I have been working on the sample dataset and sample code posted in the https://snips-nlu.readthedocs.io/en/latest/quickstart.html.

I have also added a new intent "sampleTurnOffLight" to the same sample_dataset.json which looks like below

sample_dataset.json.zip

For a text - "turn lights in basement"
I'm getting different classification every time.
Note - I retrain(fit) every time before I call the parse
I expect it to behave consistently with each re-train
Could you please confirm the behavior?
Run 1-
{
"input": "turn lights in basement",
"slots": [],
"intent": {
"intentName": "sampleTurnOffLight",
"probability": 0.6660875805168223
}
}
Run 2-
{
"input": "turn lights in basement",
"slots": [],
"intent": {
"intentName": "sampleTurnOnLight",
"probability": 0.6430405901353275
}
}

adrienball · 2019-04-05T08:56:35Z

Hi @satnam2012 ,
The training is indeed stochastic which means that each time you run fit() the resulting engine will be slightly different. In most cases, the parsings should be consistent though.
In your example, there is no "on" or "off" keyword so I suspect that this causes an ambiguity between the two intents.

We are working on improving the reproducibility of training through the use of a random seed. However, there was an issue in the scikit-learn lib which was causing non-deterministic behaviour (see scikit-learn/scikit-learn#13422), and thus preventing us from shipping this feature.

adrienball · 2019-06-20T16:52:34Z

@satnam2012 ,
The latest release 0.19.7 now allows to pass a seed to the nlu engine upon creation, so that two engines created with the same seed, and train on the same data, will produce the exact same results:

import io
import json

from snips_nlu import SnipsNLUEngine

with io.open("path/to/dataset.json", encoding="utf8") as f:
    dataset = json.load(f)

engine_1 = SnipsNLUEngine(random_state=42).fit(dataset)
engine_2 = SnipsNLUEngine(random_state=42).fit(dataset)

res_1 = engine_1.parse("turn lights in basement")
res_2 = engine_2.parse("turn lights in basement")

assert res_1 == res_2

drorvinkler · 2019-06-27T14:52:23Z

@adrienball
I tried to follow your example, but I still get different engines every time I run my script.
Note, training twice in the same script does produce identical engines, but running my python script twice produces different engines.
As far as I can tell, this happens if I have three intents, but doesn't happen if I only have two intents.

Here's a minimum working example:

from snips_nlu import SnipsNLUEngine

dataset = {'language': 'en',
           'entities': {'query': {'automatically_extensible': True, 'use_synonyms': True, 'data': []},
                        },
           'intents': {'one': {'utterances': [{'data': [{'text': 'This is a '}, {'text': 'test', 'entity': 'query', 'slot_name': 'query'}]},
                                              ]},
                       'two': {'utterances': [{'data': [{'text': 'I am another '}, {'text': 'example', 'entity': 'query', 'slot_name': 'query'}]},
                                              ]},
                       'three': {'utterances': [{'data': [{'text': 'Here come the third '}, {'text': 'utterance', 'entity': 'query', 'slot_name': 'query'}]},
                                 ]},
                       }
           }
nlu_engine = SnipsNLUEngine(random_state=42)
nlu_engine = nlu_engine.fit(dataset)
nlu_engine.persist('/tmp/model1')

After running the above code twice and persisting the engine to a different directory each time, I compared the contents of probabilistic_intent_parser/intent_classifier/intent_classifier.json in both directories, and they were not identical.

I have snips-nlu version 0.19.7 and scikit-learn version 0.21.2.

Any idea how to solve this?

adrienball · 2019-06-27T15:58:01Z

@drorvinkler I cannot reproduce using your example.
Could you provide:

python version
OS

Cheers

drorvinkler · 2019-06-27T16:07:31Z

@adrienball
Sure.
python 3.5
ubuntu 16.04

adrienball · 2019-06-27T16:48:44Z

I managed to reproduce the issue in Python3.5, it seems to be specific to this version as I can't reproduce with Python3.6 and Python2.7.
I'll investigate, my guess is that is about an iteration over the items of a dict as the ordering is non-deterministic in Python3.5.

drorvinkler · 2019-06-27T16:49:56Z

Thanks

fixes #778

* Fix non-deterministic behavior fixes #778 * Update changelog * Fix issue with dirhash * Fix issues * Fix issue with Python<3.4

adrienball added the question label Apr 5, 2019

adrienball added a commit that referenced this issue Jun 28, 2019

Fix non-deterministic behavior

66125c9

fixes #778

adrienball closed this as completed in 45d0afb Jun 28, 2019

ClemDoum pushed a commit that referenced this issue Jul 10, 2019

Fix non-deterministic behavior (#817)

64bd793

* Fix non-deterministic behavior fixes #778 * Update changelog * Fix issue with dirhash * Fix issues * Fix issue with Python<3.4

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Inconsistencies in intent classification #778

Inconsistencies in intent classification #778

satnam2012 commented Apr 4, 2019

adrienball commented Apr 5, 2019

adrienball commented Jun 20, 2019

drorvinkler commented Jun 27, 2019 •

edited

Loading

adrienball commented Jun 27, 2019

drorvinkler commented Jun 27, 2019

adrienball commented Jun 27, 2019

drorvinkler commented Jun 27, 2019

Inconsistencies in intent classification #778

Inconsistencies in intent classification #778

Comments

satnam2012 commented Apr 4, 2019

adrienball commented Apr 5, 2019

adrienball commented Jun 20, 2019

drorvinkler commented Jun 27, 2019 • edited Loading

adrienball commented Jun 27, 2019

drorvinkler commented Jun 27, 2019

adrienball commented Jun 27, 2019

drorvinkler commented Jun 27, 2019

drorvinkler commented Jun 27, 2019 •

edited

Loading