Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix seed to generate deterministic samples #713

Merged
merged 3 commits into from
Feb 26, 2022
Merged

Conversation

katxiao
Copy link
Contributor

@katxiao katxiao commented Feb 18, 2022

Resolves #690

@katxiao katxiao requested a review from a team as a code owner February 18, 2022 03:57
@katxiao katxiao requested review from fealho and removed request for a team February 18, 2022 03:57
setup.py Outdated
@@ -18,8 +18,8 @@
"numpy>=1.20.0,<2;python_version>='3.7'",
'pandas>=1.1.3,<2',
'tqdm>=4.15,<5',
'copulas>=0.6.0,<0.7',
'ctgan>=0.5.0,<0.6',
'copulas>=0.6.1.dev0,<0.7',
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'll update these version numbers once copulas and ctgan are released.

@katxiao katxiao requested a review from csala February 18, 2022 03:57
@codecov-commenter
Copy link

codecov-commenter commented Feb 18, 2022

Codecov Report

Merging #713 (eea3d2b) into v0.14.0.dev (0dae889) will increase coverage by 0.14%.
The diff coverage is 92.85%.

Impacted file tree graph

@@               Coverage Diff               @@
##           v0.14.0.dev     #713      +/-   ##
===============================================
+ Coverage        65.86%   66.00%   +0.14%     
===============================================
  Files               36       36              
  Lines             2651     2665      +14     
===============================================
+ Hits              1746     1759      +13     
- Misses             905      906       +1     
Impacted Files Coverage Δ
sdv/tabular/base.py 79.56% <90.00%> (+0.47%) ⬆️
sdv/tabular/copulas.py 87.83% <100.00%> (+0.16%) ⬆️
sdv/tabular/ctgan.py 77.77% <100.00%> (+1.03%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 0dae889...eea3d2b. Read the comment docs.

Copy link
Member

@fealho fealho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me, just some small details.

sdv/tabular/base.py Show resolved Hide resolved
sdv/tabular/copulas.py Show resolved Hide resolved
sdv/tabular/ctgan.py Outdated Show resolved Hide resolved
sdv/tabular/base.py Show resolved Hide resolved
@@ -390,6 +391,19 @@ def _conditionally_sample_rows(self, dataframe, max_retries, max_rows_multiplier

return sampled_rows

def _randomize_samples(self, randomize_samples):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't see this used anywhere apart from the unit tests.
On the other hand, what happens if randomize_samples is True?
Should we set the seed to None?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@csala Yes, if randomize_samples is true, we should set the seed to None. I updated the code to call the _randomize_samples method, and I also added an integration test to test that setting the seed to None works as expected.

@katxiao katxiao changed the base branch from v0.14.0.dev to issue-690-improve-sample February 23, 2022 00:20
Base automatically changed from issue-690-improve-sample to v0.14.0.dev February 23, 2022 05:19
@katxiao katxiao force-pushed the issue-690-fix-seed branch 2 times, most recently from 357c999 to e743810 Compare February 23, 2022 17:15
@katxiao katxiao requested a review from csala February 23, 2022 17:18
Copy link
Contributor

@csala csala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good

@katxiao katxiao merged commit 6bf4ec9 into v0.14.0.dev Feb 26, 2022
@katxiao katxiao deleted the issue-690-fix-seed branch February 26, 2022 16:40
katxiao added a commit that referenced this pull request Mar 3, 2022
* Fix seed when randomize samples is false

* update tests

* update dep versions
katxiao added a commit that referenced this pull request Mar 4, 2022
* Fix seed when randomize samples is false

* update tests

* update dep versions
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants