-
Notifications
You must be signed in to change notification settings - Fork 321
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Raise constraint errors together + misc. #807
Conversation
Codecov Report
@@ Coverage Diff @@
## master #807 +/- ##
==========================================
- Coverage 68.05% 67.98% -0.07%
==========================================
Files 38 38
Lines 2883 2899 +16
==========================================
+ Hits 1962 1971 +9
- Misses 921 928 +7
Continue to review full report at Codecov.
|
sdv/constraints/base.py
Outdated
@@ -236,6 +262,8 @@ def _validate_constraint_columns(self, table_data): | |||
table_data (pandas.DataFrame): | |||
Table data. | |||
""" | |||
self._validate_data_on_constraint(table_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These two methods: _validate_data_on_constraint
and _validate_constraint_columns
should be merged in the future, when the fit_columns_model
logic gets dropped.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these separate for a couple of reasons:
- We may end up keeping the columns model
- These are actually two different types of validation. One is checking that the data being transformed actually adheres to the constraint. The other is checking if any columns in the constraint are missing and need to be sampled. Let's keep the logic separate. Maybe changing the name of
_validate_constraint_columns
to_generate_missing_columns
would help or something like that
@@ -19,7 +19,9 @@ | |||
on the other columns of the table. | |||
* Between: Ensure that the value in one column is always between the values | |||
of two other columns/scalars. | |||
* Rounding: Round a column based on the specified number of digits. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unrelated to my changes, we just forgot to add these here when creating the constraints.
@amontanez24 @npatki What is the expected behavior of a constraint when |
The expected behavior is to handle this the same as conditional sampling with reject sampling. See #809. It would be nice to make these changes in tandem with the constraint changes. |
gc = GaussianCopula(constraints=constraints) | ||
|
||
err_msg = re.escape( | ||
"\nunsupported operand type(s) for -: 'str' and 'str'" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This PR only implements the "pretty print" for the general case where the data doesn't conform with constraint.is_valid()
. If some other error shows up, it will just be printed as usual.
31f5623
to
6820ece
Compare
sdv/constraints/base.py
Outdated
@@ -236,6 +262,8 @@ def _validate_constraint_columns(self, table_data): | |||
table_data (pandas.DataFrame): | |||
Table data. | |||
""" | |||
self._validate_data_on_constraint(table_data) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's keep these separate for a couple of reasons:
- We may end up keeping the columns model
- These are actually two different types of validation. One is checking that the data being transformed actually adheres to the constraint. The other is checking if any columns in the constraint are missing and need to be sampled. Let's keep the logic separate. Maybe changing the name of
_validate_constraint_columns
to_generate_missing_columns
would help or something like that
@@ -582,93 +581,6 @@ def test__transform_constraints_drops_columns(self): | |||
}, index=[0, 1, 2]) | |||
assert result.equals(expected_result) | |||
|
|||
def test__validate_data_on_constraints(self): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we keep the tests for this method but just move them to the appropriate place
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think this is almost ready, but we should keep the tests for the _validate_on_constraint
method
@amontanez24 The old tests didn't really match the new code, so I rewrote them a bit. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM. Just minor question about the Errors
or Error
@@ -3,3 +3,7 @@ | |||
|
|||
class MissingConstraintColumnError(Exception): | |||
"""Error to use when constraint is provided a table with missing columns.""" | |||
|
|||
|
|||
class MultipleConstraintsErrors(Exception): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be Error
instaed of Errors
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think Errors
is clearer, since it is a list of multiple errors
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
Resolve #801.
Note: this PR also fixes several typos and minor fixes.