-
Notifications
You must be signed in to change notification settings - Fork 322
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix field distribution arg in GaussianCopula #743
Conversation
sdv/tabular/copulas.py
Outdated
self._field_distributions[column] = self._default_distribution | ||
if column not in self._field_distributions: | ||
# Check if the column is a derived column. | ||
column_name = column.replace('.value', '').replace('.is_null', '') |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The is_null
columns are just boolean columns that say whether or not their row should be null. I would apply a fixed distribution to them.
I think it only makes sense to apply the specified distribution to the one ending in .value
Codecov Report
@@ Coverage Diff @@
## master #743 +/- ##
==========================================
+ Coverage 66.76% 67.15% +0.39%
==========================================
Files 36 38 +2
Lines 2738 3075 +337
==========================================
+ Hits 1828 2065 +237
- Misses 910 1010 +100
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this looks good! I left a comment but it's up to you on how to implement
if column not in self._field_distributions: | ||
# Check if the column is a derived column. | ||
column_name = column.replace('.value', '') | ||
self._field_distributions[column] = self._field_distributions.get( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
One thing to note is that the _field_distributions
dictionary will have some field names that match the RDT
output, and some that match the input. For example, if null columns are created and there is a column 'a', then the dictionary will have
{
'a': dist,
'a.is_null': default_dist
}
Idk if it makes sense to have them all match the HyperTransformer
output names or not (ie. keep the .value
extension)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I created #744 to track this question, not completely sure what we should do right now.
Resolves #746