-
Notifications
You must be signed in to change notification settings - Fork 510
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Improve random state handling #801
Conversation
Squashed commits: [58d1612] Remove seed from config and put it in the "shared" for the ProcessingUnit
e29cc06
to
6ff0eea
Compare
Codecov Report
@@ Coverage Diff @@
## develop #801 +/- ##
===========================================
+ Coverage 88.42% 88.47% +0.04%
===========================================
Files 76 76
Lines 4571 4571
Branches 882 882
===========================================
+ Hits 4042 4044 +2
+ Misses 397 395 -2
Partials 132 132 |
f956a78
to
9e43dfd
Compare
while True: | ||
noise_length = int(random_state.normal(mean_length, std_length)) | ||
i += 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unused variable i
@@ -35,8 +35,9 @@ def test_should_get_slots(self): | |||
- make me [number_of_cups:snips/number](five) cups of tea | |||
- please I want [number_of_cups](two) cups of tea""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is not needed anymore here.
@@ -101,10 +104,11 @@ def test_should_get_sub_builtin_slots(self): | |||
- find an activity from [start](6pm) to [end](8pm) | |||
- Book me a trip from [start](this friday) to [end](next tuesday)""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
@@ -65,9 +66,11 @@ def test_should_get_builtin_slots(self): | |||
- Can you tell me the weather [datetime] please ? | |||
- what is the weather forecast [datetime] in [location](paris)""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The config is not needed anymore here.
@@ -356,9 +360,10 @@ def test_should_get_slots_after_deserialization(self): | |||
- i want [number_of_cups] cups of tea please | |||
- can you prepare [number_of_cups] cups of tea ?""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
config = CRFSlotFillerConfig(random_seed=42) | |||
config = CRFSlotFillerConfig() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
classifier_config = LogRegIntentClassifierConfig(random_seed=42) | ||
slot_filler_config = CRFSlotFillerConfig(random_seed=42) | ||
classifier_config = LogRegIntentClassifierConfig() | ||
slot_filler_config = CRFSlotFillerConfig() | ||
parser_config = ProbabilisticIntentParserConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
classifier_config = LogRegIntentClassifierConfig(random_seed=42) | ||
slot_filler_config = CRFSlotFillerConfig(random_seed=42) | ||
classifier_config = LogRegIntentClassifierConfig() | ||
slot_filler_config = CRFSlotFillerConfig() | ||
parser_config = ProbabilisticIntentParserConfig( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
@@ -162,9 +169,12 @@ def test_should_get_intents(self): | |||
utterances: | |||
- yili yulu yele""") | |||
dataset = Dataset.from_yaml_files("en", [dataset_stream]).json | |||
classifier_config = LogRegIntentClassifierConfig(random_seed=42) | |||
classifier_config = LogRegIntentClassifierConfig() | |||
parser_config = ProbabilisticIntentParserConfig(classifier_config) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
random_seed=seed1), | ||
slot_filler_config=CRFSlotFillerConfig(random_seed=seed2) | ||
intent_classifier_config=LogRegIntentClassifierConfig(), | ||
slot_filler_config=CRFSlotFillerConfig() | ||
) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same comment
docs/source/tutorial.rst
Outdated
different outputs. | ||
|
||
If you want to run training in a reproducible way you can pass a random seed to | ||
your engine: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I prefer using a more impersonal form in the documentation, but that's just a suggestion. That would be something like:
Reproducible training and testing can be achieved by passing a
**random seed** to the engine:
docs/source/tutorial.rst
Outdated
@@ -174,6 +174,26 @@ the dataset we generated earlier: | |||
|
|||
engine.fit(dataset) | |||
|
|||
Note that by default, the training of the engine is non-deterministic: if you | |||
train your NLU twice on the same data and test it on the same input, you'll get | |||
different outputs. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I would be a bit more optimistic in the formulation:
Note that, by default, the training of the NLU engine is a non-deterministic process:
training and testing multiple times on the same data may produce different outputs.
ae0633e
to
45f4fd4
Compare
Description:
Currenlty
scikit-learn
bug the intent classification training was not deterministicDone:
sklearn==0.21
which contains a fix which makesSGDClassifier
training deterministicChecklist: