Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Fix field distribution arg in GaussianCopula #743

Merged
merged 3 commits into from
Mar 25, 2022
Merged

Conversation

katxiao
Copy link
Contributor

@katxiao katxiao commented Mar 23, 2022

Resolves #746

@katxiao katxiao requested a review from amontanez24 March 23, 2022 19:09
@katxiao katxiao requested a review from a team as a code owner March 23, 2022 19:09
@katxiao katxiao removed the request for review from a team March 23, 2022 19:09
self._field_distributions[column] = self._default_distribution
if column not in self._field_distributions:
# Check if the column is a derived column.
column_name = column.replace('.value', '').replace('.is_null', '')
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The is_null columns are just boolean columns that say whether or not their row should be null. I would apply a fixed distribution to them.

I think it only makes sense to apply the specified distribution to the one ending in .value

@codecov-commenter
Copy link

codecov-commenter commented Mar 23, 2022

Codecov Report

Merging #743 (cd92c10) into master (1bca0a5) will increase coverage by 0.39%.
The diff coverage is 100.00%.

@@            Coverage Diff             @@
##           master     #743      +/-   ##
==========================================
+ Coverage   66.76%   67.15%   +0.39%     
==========================================
  Files          36       38       +2     
  Lines        2738     3075     +337     
==========================================
+ Hits         1828     2065     +237     
- Misses        910     1010     +100     
Impacted Files Coverage Δ
sdv/tabular/copulas.py 88.15% <100.00%> (ø)
sdv/sdv.py 87.03% <0.00%> (-1.20%) ⬇️
sdv/timeseries/base.py 0.00% <0.00%> (ø)
sdv/lite/tabular.py 100.00% <0.00%> (ø)
sdv/lite/__init__.py 100.00% <0.00%> (ø)
sdv/relational/base.py 33.33% <0.00%> (+2.08%) ⬆️
sdv/tabular/base.py 84.32% <0.00%> (+2.14%) ⬆️
sdv/utils.py 58.62% <0.00%> (+58.62%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 1bca0a5...cd92c10. Read the comment docs.

@katxiao katxiao requested a review from amontanez24 March 23, 2022 23:16
Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this looks good! I left a comment but it's up to you on how to implement

if column not in self._field_distributions:
# Check if the column is a derived column.
column_name = column.replace('.value', '')
self._field_distributions[column] = self._field_distributions.get(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing to note is that the _field_distributions dictionary will have some field names that match the RDT output, and some that match the input. For example, if null columns are created and there is a column 'a', then the dictionary will have

{
    'a': dist,
    'a.is_null': default_dist
}

Idk if it makes sense to have them all match the HyperTransformer output names or not (ie. keep the .value extension)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I created #744 to track this question, not completely sure what we should do right now.

@katxiao katxiao merged commit acdceb7 into master Mar 25, 2022
@katxiao katxiao deleted the fix-field-distributions branch March 25, 2022 00:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Field distributions bug in GaussianCopula
3 participants