You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
As a user, it is sometimes confusing to know which handling_strategy should be used for a constraint. On top of that, if the transform strategy fails, reject_sampling could still work so it should be an automatic fall back.
Expected behavior
Remove the handling_strategy parameter from all constraints
All constraints should attempt to do the transform strategy, and if that fails because of a MissingConstraintColumnError, then it should do nothing to the data and fallback on reject sampling.
Raise the following warning if transforming fails:
Warning: <constraint name> cannot be transformed because columns [<names>] are not found. Using the reject sampling approach instead.
Additional context
This change will need to be addressed in a couple of other places besides the constraints themselves.
In metadata.table.py, there is a method called _prepare_constraints that orders the constraints and raises an error if any constraints touch the same columns. Instead, those constraints should just be set to use reject_sampling by simply skipping their transformations. These constraints should still run their is_valid check on the data being transformed. This is tricky to handle because if two constraints touch the same columns, then the one that will transform needs to go last meaning we will need a way to know how to skip transformations for certain constraints or to set them to use the identity method.
In mtadata.table.py there is a method called _transform_constraints. This method should remove the on_missing_column parameter and just always drop (ie. default to reject sampling).
The text was updated successfully, but these errors were encountered:
Problem Description
As a user, it is sometimes confusing to know which
handling_strategy
should be used for a constraint. On top of that, if the transform strategy fails,reject_sampling
could still work so it should be an automatic fall back.Expected behavior
handling_strategy
parameter from all constraintsMissingConstraintColumnError
, then it should do nothing to the data and fallback on reject sampling.Additional context
metadata.table.py
, there is a method called _prepare_constraints that orders the constraints and raises an error if any constraints touch the same columns. Instead, those constraints should just be set to usereject_sampling
by simply skipping their transformations. These constraints should still run theiris_valid
check on the data being transformed. This is tricky to handle because if two constraints touch the same columns, then the one that will transform needs to go last meaning we will need a way to know how to skip transformations for certain constraints or to set them to use theidentity
method.mtadata.table.py
there is a method called _transform_constraints. This method should remove theon_missing_column
parameter and just always drop (ie. default to reject sampling).The text was updated successfully, but these errors were encountered: