You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Trying to apply a condition on sampling with the value 0. for a float column leads to an exception.
Steps to reproduce
from sdv.demo import load_tabular_demo
from sdv.tabular import GaussianCopula
data = load_tabular_demo('student_placements')
data['experience_years'] = data['experience_years'].astype(float) # for the demonstration
model = GaussianCopula()
model.fit(data)
model.sample(1, conditions={'experience_years': 0.})
ValueError Traceback (most recent call last)
<ipython-input-147-1a9ae5e719f8> in <module>
6 model = GaussianCopula()
7 model.fit(data)
----> 8 model.sample(1, conditions={'experience_years': 0.})
/opt/conda/lib/python3.8/site-packages/sdv/tabular/base.py in sample(self, num_rows, max_retries, max_rows_multiplier, conditions, float_rtol, graceful_reject_sampling)
/opt/conda/lib/python3.8/site-packages/sdv/tabular/base.py in _conditionally_sample_rows(self, dataframe, max_retries, max_rows_multiplier, condition, transformed_condition, float_rtol, graceful_reject_sampling)
ValueError: No valid rows could be generated with the given conditions.
Ability to contribute
I think that I can fix the issue and I'll create a pull request.
I suspect that it happens because of rtol with zero. The method _filter_conditions checked for < instead of <= (In numpy, <= is used [source]):
if column_values.dtype.kind == 'f':
distance = value * float_rtol
sampled = sampled[np.abs(column_values - value) < distance]
sampled[column] = value
The text was updated successfully, but these errors were encountered:
csala
changed the title
Sampling with a float constrain doesn't work for the value zero
Sampling with conditions={column: 0.0} for float columns doesn't work
Jul 29, 2021
Thanks for reporting this @shlomihod
I think you are right about the problem being related to the tolerance, which makes the distance become 0 when the value is 0.
I think that we will revisit this rtol at some point and rather convert it to an atol (absolute tolerance), so the distance does not depend on the actual value, but right now your proposal should work around the problem perfectly fine.
Hi @shlomihod, we appreciate your interest in contributing and your change looks good! Would you have a chance to make the changes requested? Please let us know if you have any other questions.
Environment Details
Error Description
Trying to apply a condition on sampling with the value
0.
for a float column leads to an exception.Steps to reproduce
Ability to contribute
I think that I can fix the issue and I'll create a pull request.
I suspect that it happens because of
rtol
with zero. The method_filter_conditions
checked for<
instead of<=
(In numpy,<=
is used [source]):The text was updated successfully, but these errors were encountered: