Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

FixedIncrements constraint cannot be applied conjunction with Inequality constraint #2360

Open
frances-h opened this issue Jan 29, 2025 · 0 comments
Labels
bug Something isn't working

Comments

@frances-h
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version:
  • Python version:
  • Operating System:

Error Description

Overlapping other constraints with the FixedIncrements constraint may cause errors during sampling. This was discovered when trying to test the ChainedInequality constraint.

The issue is that the FixedIncrements constraint will directly modify the constraint column, which may make downstream constraints invalid. Data validation is only done on the original data and not on the intermediate input data to the constraint, so this is not being caught during fit. In the below example, null/-inf values are being created by the Inequality constraint during preprocess. During sample, trying to force null/inf values back to ints is what is causing the error.

Applying Inequality before FixedIncrements works as expected.

Steps to reproduce

import pandas as pd
import numpy as np

from sdv.metadata import Metadata
from sdv.single_table import GaussianCopulaSynthesizer

data = pd.DataFrame(data={
    'id': [0, 1, 2, 3, 4],
    'low_value': [1, 3, 5, 4, 3],
    'high_value': [5, 5, 10, 15, 5],
})

metadata = Metadata.load_from_dict({
    'tables': {
        'table': {
            'primary_key': 'id',
            'columns': {
                'id': { 'sdtype': 'id' },
                'low_value': { 'sdtype': 'numerical' },
                'high_value': { 'sdtype': 'numerical' },
            }
        }
    }
})

fixed_increments = {
    'constraint_class': 'FixedIncrements',
    'constraint_parameters': {
        'column_name': 'high_value',
        'increment_value': 5
    }
}

inequality = {
    'constraint_class': 'Inequality',
    'constraint_parameters': {
        'high_column_name': 'high_value',
        'low_column_name': 'low_value'
    }
}

synthesizer = GaussianCopulaSynthesizer(metadata)
synthesizer.add_constraints([fixed_increments, inequality])
synthesizer.sample(num_rows=10)

I would expect the sampling here to work. Although there is an overlapping column, the Inequality constraint can fall back to using reject sampling. Instead, I see that there is an error during sampling:

IntCastingNaNError: Error: Sampling terminated. No results were saved due to unspecified "output_file_path".

@frances-h frances-h added the bug Something isn't working label Jan 29, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

1 participant