Add method to set random seed on copulas models #313

katxiao · 2022-02-04T19:39:06Z

As part of sdv-dev/SDV#690, we want to add a fixed seed when sampling. This requires setting the seed at sample time (after the model has already been created).

Currently, we can only pass in the random_seed at initialization of the model. In this PR, I add a setter method for random_seed, to enable to following flow:

create model
set seed
sample
repeat steps 2-3 as needed.

Resolves #113

codecov-commenter · 2022-02-04T19:43:53Z

Codecov Report

Merging #313 (95ad1fe) into master (350888e) will increase coverage by 0.15%.
The diff coverage is 88.13%.

@@            Coverage Diff             @@
##           master     #313      +/-   ##
==========================================
+ Coverage   87.16%   87.31%   +0.15%     
==========================================
  Files          27       27              
  Lines        1706     1727      +21     
==========================================
+ Hits         1487     1508      +21     
  Misses        219      219

Impacted Files	Coverage Δ
copulas/datasets.py	`52.27% <41.66%> (+2.27%)`	⬆️
copulas/__init__.py	`97.95% <100.00%> (+0.25%)`	⬆️
copulas/bivariate/base.py	`87.16% <100.00%> (+0.17%)`	⬆️
copulas/multivariate/base.py	`61.53% <100.00%> (+2.07%)`	⬆️
copulas/multivariate/gaussian.py	`91.47% <100.00%> (ø)`
copulas/multivariate/vine.py	`99.32% <100.00%> (ø)`
copulas/univariate/base.py	`82.82% <100.00%> (+0.32%)`	⬆️
copulas/univariate/gaussian_kde.py	`96.25% <100.00%> (+0.04%)`	⬆️
copulas/univariate/truncated_gaussian.py	`100.00% <100.00%> (ø)`

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 350888e...95ad1fe. Read the comment docs.

csala

This looks good @katxiao ! I want to have another look before approving, but the changes seem to be right on spot so far.

One note is: maybe we should consider also addressing issue #113 on this PR. This would basically fix the current random_state wrapper, which currently sets the numpy seed globally instead of just within the current operation, but also allow setting the random seed once and calling sample multiple times obtaining different results at each call (Notice that this would be necessary in order to allow reject_sampling strategies with a fixed seed!)

amontanez24

This looks good to me, but I think @csala has a good point. Maybe we can add that change too

katxiao · 2022-02-07T21:18:07Z

This looks good @katxiao ! I want to have another look before approving, but the changes seem to be right on spot so far.

One note is: maybe we should consider also addressing issue #113 on this PR. This would basically fix the current random_state wrapper, which currently sets the numpy seed globally instead of just within the current operation, but also allow setting the random seed once and calling sample multiple times obtaining different results at each call (Notice that this would be necessary in order to allow reject_sampling strategies with a fixed seed!)

@csala I updated the PR to address the issue you linked.

I'm a little confused, because even if we switch to setting the numpy state instead of the seed, aren't we still setting it globally?

amontanez24

I think this looks good!

copulas/__init__.py

csala · 2022-02-09T10:24:33Z

I'm a little confused, because even if we switch to setting the numpy state instead of the seed, aren't we still setting it globally?

Sorry, my original statement was not precise enough. It is true that every time we set the random state we do it globally, but what I actually meant was that the change was permanent, meaning that operations that come after ours would also be affected by that change. But this was actually wrong, because the original random_seed function already had the try/finally block that restored the original state after the operation is finished.

In any case, the real advantage of using the random_state instead of the random_seed is the other part: now this is a state which changes over time as calls happen, rather than a fixed seed that is always the same on every call.

csala

I think that there are a few changes required to achieve the desired functionality.

Here is an ipython snippet that explains the expected behavior a bit more precisely:

In [1]: import numpy as np

In [2]: # Background: We simulate an external seed of 42, which we
   ...: # do not want to alter, and we set our model seed to 0.
   ...: # For reference, these are the sequences of random numbers
   ...: # that each seed produces:
   ...: 
   ...: np.random.seed(42)
   ...: np.random.random(size=4)
Out[2]: array([0.37454012, 0.95071431, 0.73199394, 0.59865848])

In [3]: np.random.seed(0)
   ...: np.random.random(size=4)
Out[3]: array([0.5488135 , 0.71518937, 0.60276338, 0.54488318])

In [4]: # EXTERNAL: We simulate an external seed that we do not want to alter
   ...: # and certify that the random numbers are the expected ones
   ...: np.random.seed(42)
   ...: np.random.random(size=2)
Out[4]: array([0.37454012, 0.95071431])

In [5]: # FIRST CALL: Inside our decorator we capture the original state
   ...: # and set a new one
   ...: original_state = np.random.get_state()
   ...: new_state = np.random.RandomState(seed=0).get_state()
   ...: np.random.set_state(new_state)
   ...: 
   ...: # Certify the random numbers are the expected ones
   ...: np.random.random(size=2)
Out[5]: array([0.5488135 , 0.71518937])

In [6]: # We capture the state AFTER the call and restore the original one
   ...: post_state = np.random.get_state()
   ...: np.random.set_state(original_state)
   ...: 
   ...: # Certify that the original state is restored and the random
   ...: # sequence can continue as expected (sequence continues)
   ...: np.random.random(size=2)
Out[6]: array([0.73199394, 0.59865848])

In [7]: # SECOND CALL: Inside the decorator again, we restore the previous state
   ...: np.random.set_state(post_state)
   ...: 
   ...: # We certify that the sequence of random numbers with seed = 0 continues
   ...: # as expected
   ...: np.random.random(size=2)
Out[7]: array([0.60276338, 0.54488318])

Additionally, we should create an integration test that reproduces a sequence similar to the one
shown above and that certifies that multiple calls after setting the seed produce different results, but
always following the expected sequence.

copulas/__init__.py

csala · 2022-02-09T11:20:43Z

copulas/__init__.py

    try:
        yield
    finally:
-        np.random.set_state(state)
+        set_model_random_state(desired_state)


We should not be caputring the desired_state, but rather the current state of numpy as returned by np.random.get_state()

Good catch! Made the fix.

katxiao · 2022-02-09T22:51:43Z

np.random.random(size=2)

@csala I added the integration test and addressed the other two comments!

csala

I added a comment about an edge case bug. Other than that, this looks ready

csala · 2022-02-11T18:04:43Z

copulas/__init__.py

+        raise TypeError(f'RandomState {random_state} is an unexpected type. '
+                        'Expected to be int, np.random.RandomState, or tuple.')
+
+    np.random.set_state(desired_state)


Bug: If random_state is a tuple, desired_state is never assigned any value. I think that it would be simpler to just re-use the random_state variable name instead of desired_state

csala

I think this looks all correct so far. The only comment that I would add here is that it would be interesting to add one integration test per model, which certifies that the set_random_state is working as expected.

The tests I'm thinking about would be something like this (for each model!):

# Fit the model on some random data
fit_data = np.random.whatever...
model = Model()
model.fit(fit_data)

# Sample truly random data
random = model.sample(10)

# Set the seed to a fixed value and sample TWICE
model.set_random_seed(0)
seeded_0_0 = model,sample(10)
seeded_0_1 = model,sample(10)

# Set the seed again to the same value and sample TWICE again
model.set_random_seed(0)
seeded_1_0 = model,sample(10)
seeded_1_1 = model,sample(10)

# assert that the random data is not equal to the data with fixed seed
np.testing.assert_not_equals(random, sampled_0_0)
# assert that the two sample calls after setting the seed generated different outputs
np.testing.assert_not_equals(sampled_0_0, sampled_0_1)
# assert that setting the seed sampling once always produces the same results
np.testing.assert_equals(sampled_0_0, sampled_1_0)
# assert that the second call after setting the seed continues to produce the same results
np.testing.assert_equals(sampled_0_1, sampled_1_1)

tests/end-to-end/univariate/test_base.py

tests/end-to-end/multivariate/test_base.py

copulas/univariate/base.py

tests/end-to-end/bivariate/test_base.py

csala

Looks good now!

katxiao force-pushed the set-random-seed branch from 0d0f6ed to b58edc2 Compare February 4, 2022 19:50

katxiao requested a review from csala February 4, 2022 19:58

katxiao marked this pull request as ready for review February 4, 2022 20:12

katxiao requested a review from a team as a code owner February 4, 2022 20:12

katxiao requested review from amontanez24 and removed request for a team February 4, 2022 20:13

csala reviewed Feb 4, 2022

View reviewed changes

amontanez24 approved these changes Feb 4, 2022

View reviewed changes

amontanez24 approved these changes Feb 7, 2022

View reviewed changes

copulas/__init__.py Show resolved Hide resolved

katxiao force-pushed the set-random-seed branch 3 times, most recently from 4cb2e62 to 5152fd6 Compare February 8, 2022 19:29

csala suggested changes Feb 9, 2022

View reviewed changes

katxiao force-pushed the set-random-seed branch from b2eadca to 0aef6ba Compare February 9, 2022 23:37

csala suggested changes Feb 11, 2022

View reviewed changes

katxiao force-pushed the set-random-seed branch from 0aef6ba to 9292677 Compare February 11, 2022 18:17

csala reviewed Feb 16, 2022

View reviewed changes

katxiao added 9 commits February 17, 2022 10:23

Add method to set random seed on copulas models

801109e

add unit tests

39929c2

Use RandomState instead of seed

bd15627

update tests

2671e47

fix lint

1ce2485

some fixes

0e46c12

default random state to None

ecdf2fb

cr comments

8b88d85

add integration tests

bc87087

katxiao force-pushed the set-random-seed branch from 51df96a to bc87087 Compare February 17, 2022 15:23

csala suggested changes Feb 17, 2022

View reviewed changes

cr

464c950

csala approved these changes Feb 17, 2022

View reviewed changes

fix tests

c70a35d

katxiao force-pushed the set-random-seed branch from 0383db2 to f06210e Compare February 17, 2022 22:47

store random seed as an object

95ad1fe

katxiao force-pushed the set-random-seed branch from f06210e to 95ad1fe Compare February 17, 2022 23:54

katxiao merged commit f831e23 into master Feb 18, 2022

katxiao deleted the set-random-seed branch February 18, 2022 03:48

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add method to set random seed on copulas models #313

Add method to set random seed on copulas models #313

katxiao commented Feb 4, 2022 •

edited

Loading

codecov-commenter commented Feb 4, 2022 •

edited

Loading

csala left a comment

amontanez24 left a comment

katxiao commented Feb 7, 2022

amontanez24 left a comment

csala commented Feb 9, 2022

csala left a comment •

edited

Loading

csala Feb 9, 2022

katxiao Feb 11, 2022

katxiao commented Feb 9, 2022

csala left a comment

csala Feb 11, 2022

csala left a comment

csala left a comment

Add method to set random seed on copulas models #313

Add method to set random seed on copulas models #313

Conversation

katxiao commented Feb 4, 2022 • edited Loading

codecov-commenter commented Feb 4, 2022 • edited Loading

Codecov Report

csala left a comment

Choose a reason for hiding this comment

amontanez24 left a comment

Choose a reason for hiding this comment

katxiao commented Feb 7, 2022

amontanez24 left a comment

Choose a reason for hiding this comment

csala commented Feb 9, 2022

csala left a comment • edited Loading

Choose a reason for hiding this comment

csala Feb 9, 2022

Choose a reason for hiding this comment

katxiao Feb 11, 2022

Choose a reason for hiding this comment

katxiao commented Feb 9, 2022

csala left a comment

Choose a reason for hiding this comment

csala Feb 11, 2022

Choose a reason for hiding this comment

csala left a comment

Choose a reason for hiding this comment

csala left a comment

Choose a reason for hiding this comment

katxiao commented Feb 4, 2022 •

edited

Loading

codecov-commenter commented Feb 4, 2022 •

edited

Loading

csala left a comment •

edited

Loading