Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Inequality constraint raises RuntimeWarning (invalid value encountered in log) #1275

Closed
npatki opened this issue Feb 24, 2023 · 0 comments · Fixed by #1377
Closed

Inequality constraint raises RuntimeWarning (invalid value encountered in log) #1275

npatki opened this issue Feb 24, 2023 · 0 comments · Fixed by #1377
Assignees
Labels
feature:constraints Related to inputting rules or business logic internal The issue doesn't change the API or functionality
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Feb 24, 2023

Environment Details

  • SDV version: 1.0.0b0 (Beta version)
  • Python version: 3.8
  • Operating System: Linux (Colab Notebook)

Error Description

When I apply an Inequality constraint to the demo data, I see a RuntimeWarning when fitting my data.

/usr/local/lib/python3.8/dist-packages/sdv/constraints/tabular.py:451: RuntimeWarning:

invalid value encountered in log

This doesn't seem to affect the data quality so I'm ignoring it. But we should check to see if it's indicative of any unintended behavior.

Steps to reproduce

from sdv.datasets.demo import download_demo
from sdv.single_table import GaussianCopulaSynthesizer

real_data, metadata = download_demo(
    modality='single_table',
    dataset_name='fake_hotel_guests'
)

checkin_lessthan_checkout = {
    'constraint_class': 'Inequality',
    'constraint_parameters': {
        'low_column_name': 'checkin_date',
        'high_column_name': 'checkout_date'
    }
}

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([
    checkin_lessthan_checkout
])
synthesizer.fit(real_data)

Context

The following line is causing the error in constraints/tabular.py:

table_data[self._diff_column_name] = np.log(diff_column + 1)

It seems like np.log throws this warning if any value is 0 or negative. But this is odd: All values should be 1 or more because:

  • We are ensuring that the constraint is True in the real data (so each value in diff_column must be >=0 )
  • We are adding 1 to each item, so each value must be >= 1
@npatki npatki added bug Something isn't working feature:constraints Related to inputting rules or business logic labels Feb 24, 2023
@npatki npatki added internal The issue doesn't change the API or functionality and removed bug Something isn't working labels Apr 12, 2023
@frances-h frances-h self-assigned this Apr 19, 2023
@amontanez24 amontanez24 added this to the 1.0.1 milestone Apr 19, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature:constraints Related to inputting rules or business logic internal The issue doesn't change the API or functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants