Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inconsistencies in intent classification #778

Closed
satnam2012 opened this issue Apr 4, 2019 · 7 comments
Closed

Inconsistencies in intent classification #778

satnam2012 opened this issue Apr 4, 2019 · 7 comments
Labels

Comments

@satnam2012
Copy link

Hi,

I have been working on the sample dataset and sample code posted in the https://snips-nlu.readthedocs.io/en/latest/quickstart.html.

I have also added a new intent "sampleTurnOffLight" to the same sample_dataset.json which looks like below

sample_dataset.json.zip

For a text - "turn lights in basement"
I'm getting different classification every time.
Note - I retrain(fit) every time before I call the parse
I expect it to behave consistently with each re-train
Could you please confirm the behavior?
Run 1-
{
"input": "turn lights in basement",
"slots": [],
"intent": {
"intentName": "sampleTurnOffLight",
"probability": 0.6660875805168223
}
}
Run 2-
{
"input": "turn lights in basement",
"slots": [],
"intent": {
"intentName": "sampleTurnOnLight",
"probability": 0.6430405901353275
}
}

@adrienball
Copy link
Contributor

Hi @satnam2012 ,
The training is indeed stochastic which means that each time you run fit() the resulting engine will be slightly different. In most cases, the parsings should be consistent though.
In your example, there is no "on" or "off" keyword so I suspect that this causes an ambiguity between the two intents.

We are working on improving the reproducibility of training through the use of a random seed. However, there was an issue in the scikit-learn lib which was causing non-deterministic behaviour (see scikit-learn/scikit-learn#13422), and thus preventing us from shipping this feature.

@adrienball
Copy link
Contributor

@satnam2012 ,
The latest release 0.19.7 now allows to pass a seed to the nlu engine upon creation, so that two engines created with the same seed, and train on the same data, will produce the exact same results:

import io
import json

from snips_nlu import SnipsNLUEngine

with io.open("path/to/dataset.json", encoding="utf8") as f:
    dataset = json.load(f)

engine_1 = SnipsNLUEngine(random_state=42).fit(dataset)
engine_2 = SnipsNLUEngine(random_state=42).fit(dataset)

res_1 = engine_1.parse("turn lights in basement")
res_2 = engine_2.parse("turn lights in basement")

assert res_1 == res_2

@drorvinkler
Copy link

drorvinkler commented Jun 27, 2019

@adrienball
I tried to follow your example, but I still get different engines every time I run my script.
Note, training twice in the same script does produce identical engines, but running my python script twice produces different engines.
As far as I can tell, this happens if I have three intents, but doesn't happen if I only have two intents.

Here's a minimum working example:

from snips_nlu import SnipsNLUEngine

dataset = {'language': 'en',
           'entities': {'query': {'automatically_extensible': True, 'use_synonyms': True, 'data': []},
                        },
           'intents': {'one': {'utterances': [{'data': [{'text': 'This is a '}, {'text': 'test', 'entity': 'query', 'slot_name': 'query'}]},
                                              ]},
                       'two': {'utterances': [{'data': [{'text': 'I am another '}, {'text': 'example', 'entity': 'query', 'slot_name': 'query'}]},
                                              ]},
                       'three': {'utterances': [{'data': [{'text': 'Here come the third '}, {'text': 'utterance', 'entity': 'query', 'slot_name': 'query'}]},
                                 ]},
                       }
           }
nlu_engine = SnipsNLUEngine(random_state=42)
nlu_engine = nlu_engine.fit(dataset)
nlu_engine.persist('/tmp/model1')

After running the above code twice and persisting the engine to a different directory each time, I compared the contents of probabilistic_intent_parser/intent_classifier/intent_classifier.json in both directories, and they were not identical.

I have snips-nlu version 0.19.7 and scikit-learn version 0.21.2.

Any idea how to solve this?

@adrienball
Copy link
Contributor

@drorvinkler I cannot reproduce using your example.
Could you provide:

  • python version
  • OS

Cheers

@drorvinkler
Copy link

@adrienball
Sure.
python 3.5
ubuntu 16.04

@adrienball
Copy link
Contributor

I managed to reproduce the issue in Python3.5, it seems to be specific to this version as I can't reproduce with Python3.6 and Python2.7.
I'll investigate, my guess is that is about an iteration over the items of a dict as the ordering is non-deterministic in Python3.5.

@drorvinkler
Copy link

Thanks

adrienball added a commit that referenced this issue Jun 28, 2019
ClemDoum pushed a commit that referenced this issue Jul 10, 2019
* Fix non-deterministic behavior

fixes #778

* Update changelog

* Fix issue with dirhash

* Fix issues

* Fix issue with Python<3.4
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants