Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Enable batch sampling #709

Merged
merged 6 commits into from
Feb 22, 2022

Conversation

katxiao
Copy link
Contributor

@katxiao katxiao commented Feb 10, 2022

Resolves #693

@codecov-commenter
Copy link

codecov-commenter commented Feb 10, 2022

Codecov Report

❗ No coverage uploaded for pull request base (issue-692-sample-remaining-columns@32dd6ab). Click here to learn what that means.
The diff coverage is n/a.

❗ Current head 9ecfeae differs from pull request most recent head 37f962e. Consider uploading reports for the commit 37f962e to get more accurate results

Impacted file tree graph

@@                          Coverage Diff                          @@
##             issue-692-sample-remaining-columns     #709   +/-   ##
=====================================================================
  Coverage                                      ?   65.93%           
=====================================================================
  Files                                         ?       36           
  Lines                                         ?     2657           
  Branches                                      ?        0           
=====================================================================
  Hits                                          ?     1752           
  Misses                                        ?      905           
  Partials                                      ?        0           

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 32dd6ab...37f962e. Read the comment docs.

@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 83c4049 to c394863 Compare February 10, 2022 20:41
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch 3 times, most recently from 129b19f to f79231a Compare February 11, 2022 02:00
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 46b70f7 to 1f8ebbd Compare February 11, 2022 02:19
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from bc6e6e2 to b797faf Compare February 11, 2022 22:23
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 1f8ebbd to 30bd3cd Compare February 11, 2022 23:08
@katxiao katxiao marked this pull request as ready for review February 11, 2022 23:10
@katxiao katxiao requested a review from a team as a code owner February 11, 2022 23:10
@katxiao katxiao requested review from fealho and csala and removed request for a team February 11, 2022 23:10
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 30bd3cd to 0fbd9fc Compare February 11, 2022 23:16
Copy link
Member

@fealho fealho left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks pretty good to me, nice job 👍

tests/unit/tabular/test_base.py Show resolved Hide resolved
sdv/tabular/base.py Outdated Show resolved Hide resolved
sdv/tabular/base.py Outdated Show resolved Hide resolved
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from b797faf to e3e9d67 Compare February 16, 2022 21:58
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 0fbd9fc to 1a7abb6 Compare February 16, 2022 22:09
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from e3e9d67 to 3c7135b Compare February 16, 2022 22:14
@katxiao katxiao force-pushed the issue-693-batch-sampling branch 2 times, most recently from 48c0eec to 786927d Compare February 16, 2022 22:16
@katxiao katxiao requested review from amontanez24 and removed request for csala February 16, 2022 22:16
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from 3c7135b to 754b9e1 Compare February 16, 2022 23:58
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 786927d to ea3d28a Compare February 16, 2022 23:58
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from 754b9e1 to 640419c Compare February 17, 2022 19:22
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from ea3d28a to 50b1585 Compare February 17, 2022 19:22
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One small question but otherwise LGTM!

sampled, num_valid = self._sample_rows(
batch_size_per_try, conditions, transformed_conditions, float_rtol, sampled,
)

num_increase = min(num_valid - prev_num_valid, remaining)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In what case would remaining be less than num_valid - prev_num_valid

Copy link
Contributor Author

@katxiao katxiao Feb 17, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since we are sampling up to batch_size_per_try rows, we could sample more than the number of remaining rows. For example, if we wish to sample 100 rows and we've already sampled 90 valid rows, but our batch size is 50, then we could sample something like 38 rows in the _sample_rows call above

@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from 640419c to 38c1783 Compare February 18, 2022 04:28
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 39754ef to 1098a0d Compare February 18, 2022 04:29
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from 38c1783 to 4c38284 Compare February 18, 2022 04:40
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 1098a0d to 9ecfeae Compare February 18, 2022 04:40
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from 4c38284 to 780a293 Compare February 18, 2022 05:48
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from 9ecfeae to f211698 Compare February 18, 2022 05:48
@katxiao katxiao force-pushed the issue-692-sample-remaining-columns branch from 780a293 to 32dd6ab Compare February 18, 2022 06:08
@katxiao katxiao force-pushed the issue-693-batch-sampling branch from f211698 to 37f962e Compare February 18, 2022 06:08
@katxiao katxiao merged commit 5da1e4b into issue-692-sample-remaining-columns Feb 22, 2022
@katxiao katxiao deleted the issue-693-batch-sampling branch February 22, 2022 17:27
katxiao added a commit that referenced this pull request Feb 22, 2022
* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
katxiao added a commit that referenced this pull request Feb 22, 2022
* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
katxiao added a commit that referenced this pull request Feb 22, 2022
* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
katxiao added a commit that referenced this pull request Feb 23, 2022
* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
katxiao added a commit that referenced this pull request Feb 23, 2022
* Update sample method args

* Add unit tests

* remove conditioning logic for now

* Add error handling

* Make integration tests pass

* code review comments

* Add sample conditions method

* add back unit tests with conditions

* Update logic for handling multiple conditions

* fix lint

* fix integration tests

* Add method to sample remaining columns (3/3) (#708)

* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
katxiao added a commit that referenced this pull request Mar 3, 2022
* Update sample method args

* Add unit tests

* remove conditioning logic for now

* Add error handling

* Make integration tests pass

* code review comments

* Add sample conditions method

* add back unit tests with conditions

* Update logic for handling multiple conditions

* fix lint

* fix integration tests

* Add method to sample remaining columns (3/3) (#708)

* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
katxiao added a commit that referenced this pull request Mar 4, 2022
* Update sample method args

* Add unit tests

* remove conditioning logic for now

* Add error handling

* Make integration tests pass

* code review comments

* Add sample conditions method

* add back unit tests with conditions

* Update logic for handling multiple conditions

* fix lint

* fix integration tests

* Add method to sample remaining columns (3/3) (#708)

* Add method to sample remaining columns

* update integration tests

* add unit tests

* update tutorials and docs

* Enable batch sampling (#709)

* Add batch sampling and progress bar

* Make sure to close progress bar

* Periodically write to file

* add unit tests

* cr comments

* fix test
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants