Fix field distribution arg in GaussianCopula #743

katxiao · 2022-03-23T19:09:01Z

Resolves #746

amontanez24 · 2022-03-23T19:44:02Z

sdv/tabular/copulas.py

-                self._field_distributions[column] = self._default_distribution
+            if column not in self._field_distributions:
+                # Check if the column is a derived column.
+                column_name = column.replace('.value', '').replace('.is_null', '')


The is_null columns are just boolean columns that say whether or not their row should be null. I would apply a fixed distribution to them.

I think it only makes sense to apply the specified distribution to the one ending in .value

codecov-commenter · 2022-03-23T19:51:28Z

Codecov Report

Merging #743 (cd92c10) into master (1bca0a5) will increase coverage by 0.39%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #743      +/-   ##
==========================================
+ Coverage   66.76%   67.15%   +0.39%     
==========================================
  Files          36       38       +2     
  Lines        2738     3075     +337     
==========================================
+ Hits         1828     2065     +237     
- Misses        910     1010     +100

Impacted Files	Coverage Δ
sdv/tabular/copulas.py	`88.15% <100.00%> (ø)`
sdv/sdv.py	`87.03% <0.00%> (-1.20%)`	⬇️
sdv/timeseries/base.py	`0.00% <0.00%> (ø)`
sdv/lite/tabular.py	`100.00% <0.00%> (ø)`
sdv/lite/__init__.py	`100.00% <0.00%> (ø)`
sdv/relational/base.py	`33.33% <0.00%> (+2.08%)`	⬆️
sdv/tabular/base.py	`84.32% <0.00%> (+2.14%)`	⬆️
sdv/utils.py	`58.62% <0.00%> (+58.62%)`	⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bca0a5...cd92c10. Read the comment docs.

amontanez24

I think this looks good! I left a comment but it's up to you on how to implement

amontanez24 · 2022-03-24T17:21:02Z

sdv/tabular/copulas.py

+            if column not in self._field_distributions:
+                # Check if the column is a derived column.
+                column_name = column.replace('.value', '')
+                self._field_distributions[column] = self._field_distributions.get(


One thing to note is that the _field_distributions dictionary will have some field names that match the RDT output, and some that match the input. For example, if null columns are created and there is a column 'a', then the dictionary will have

{ 'a': dist, 'a.is_null': default_dist }

Idk if it makes sense to have them all match the HyperTransformer output names or not (ie. keep the .value extension)

I created #744 to track this question, not completely sure what we should do right now.

katxiao added 2 commits March 23, 2022 14:52

Fix field distribution setting

9643a6a

add unit test

30a47c8

katxiao requested a review from amontanez24 March 23, 2022 19:09

katxiao requested a review from a team as a code owner March 23, 2022 19:09

katxiao removed the request for review from a team March 23, 2022 19:09

amontanez24 requested changes Mar 23, 2022

View reviewed changes

cr

cd92c10

katxiao requested a review from amontanez24 March 23, 2022 23:16

amontanez24 approved these changes Mar 24, 2022

View reviewed changes

katxiao merged commit acdceb7 into master Mar 25, 2022

katxiao deleted the fix-field-distributions branch March 25, 2022 00:05

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix field distribution arg in GaussianCopula #743

Fix field distribution arg in GaussianCopula #743

katxiao commented Mar 23, 2022 •

edited

Loading

amontanez24 Mar 23, 2022

codecov-commenter commented Mar 23, 2022 •

edited

Loading

amontanez24 left a comment •

edited

Loading

amontanez24 Mar 24, 2022

katxiao Mar 25, 2022

Fix field distribution arg in GaussianCopula #743

Fix field distribution arg in GaussianCopula #743

Conversation

katxiao commented Mar 23, 2022 • edited Loading

amontanez24 Mar 23, 2022

Choose a reason for hiding this comment

codecov-commenter commented Mar 23, 2022 • edited Loading

Codecov Report

amontanez24 left a comment • edited Loading

Choose a reason for hiding this comment

amontanez24 Mar 24, 2022

Choose a reason for hiding this comment

katxiao Mar 25, 2022

Choose a reason for hiding this comment

katxiao commented Mar 23, 2022 •

edited

Loading

codecov-commenter commented Mar 23, 2022 •

edited

Loading

amontanez24 left a comment •

edited

Loading