Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Giving user an ability to write general constraints as functions #411

Closed
kveerama opened this issue Apr 21, 2021 · 6 comments
Closed

Giving user an ability to write general constraints as functions #411

kveerama opened this issue Apr 21, 2021 · 6 comments
Labels
feature request Request for a new feature resolution:duplicate This issue or pull request already exists

Comments

@kveerama
Copy link
Contributor

kveerama commented Apr 21, 2021

Problem Description

Currently if we want to create a general purpose constraint to be able to use across multiple columns like this

from sdv.constraints import Positives

multiple_columns_constraint = Positives(
    columns=['col1', 'col2', 'col3'],
    handling_strategy='reject_sampling'
)

We would have to write a class. Possible to design a way so user only writes a function and we can enable the same usage.

[Note: will update this issue with more detail]

@kveerama kveerama added feature request Request for a new feature pending review labels Apr 21, 2021
@Timbimjim
Copy link

Hey, could you let me know, what code your Positives constraints has. If i understand correctly i would have to add a custom constraint in the tabular.py.

I am just looking for a constraint, that will create synth data which only has positive values.

thx

@npatki
Copy link
Contributor

npatki commented Apr 29, 2021

I noticed that you can use a repeated CustomContraint but not proper functional programming. Not all callables work.

For eg, this works (but needs to be repeated for every column):

def age_is_pos(data):
     return data['age'] > 0

age_is_pos_constraint = CustomConstraint(is_valid=age_is_pos)

However, this throws an error when modeling:

age_is_pos = lambda data: data['age'] > 0
age_is_pos_constraint = CustomConstraint(is_valid=age_is_pos)

I also can't create shortcuts (same error):

def get_pos_fn(col_name):
    def fn(data):
        return data[col_name] > 0

    return fn 

age_is_pos = get_pos_fn('age')
CustomConstraint(is_valid=age_is_pos)

@TheEdoardo93
Copy link

I noticed that you can use a repeated CustomContraint but not proper functional programming. Not all callables work.

For eg, this works (but needs to be repeated for every column):

def age_is_pos(data):
     return data['age'] > 0

age_is_pos_constraint = CustomConstraint(is_valid=age_is_pos)

However, this throws an error when modeling:

age_is_pos = lambda data: data['age'] > 0
age_is_pos_constraint = CustomConstraint(is_valid=age_is_pos)

I also can't create shortcuts (same error):

def get_pos_fn(col_name):
    def fn(data):
        return data[col_name] > 0

    return fn 

age_is_pos = get_pos_fn('age')
CustomConstraint(is_valid=age_is_pos)

I've tried to use the suggestions of @npatki when modeling and the GaussianCopula model is fitting correctly.
But when I tried to save the model by using the API "mode.save('fitted_gc_model.pkl')" I'm receving an error which says that Pickle is not able to save e.g. "get_pos_fn()". I've tried with Joblib too but it doesn't work too.

Here the error:

AttributeError: can't pickle local object '_define_constraints.<locals>."function_name".<locals>.fn'

Any suggestions?
Does anyone of us known when e.g. Positives will be released publicly?

@npatki
Copy link
Contributor

npatki commented May 20, 2021

Could you spin up another issue for the issue with pickle? Let's track that separately.

Does anyone of us known when e.g. Positives will be released publicly?

We're aware that there's an increasing demand for this feature. Stay tuned, and we'll provide an update when they're available in a future release!

For the positives issue specifically, there's a workaround you can try if you're using Copula-based approaches (GaussianCopula or CopulaGAN): Try enforcing specific distribution types such as truncated_gaussian or gamma so the model will automatically learn that 0 is a lower bound from your input data. More info in the user guide and #200

@kvrameshreddy
Copy link

Hi @npatki , Can we use the general constraint functionality for relational data generation ?

When can we except this functionality available.

Thankyou.

@npatki
Copy link
Contributor

npatki commented Jun 10, 2022

Can we use the general constraint functionality for relational data generation ?

Constraints are available for relational data. See User Guide for details.

Possible to design a way so user only writes a function and we can enable the same usage.

Yes, our plan is to add a create_custom_constraint factory method so that it will be easier to write and use Custom Constraints. Let's defer to #836 for continuing the conversation.

@npatki npatki closed this as not planned Won't fix, can't repro, duplicate, stale Jun 10, 2022
@npatki npatki added resolution:duplicate This issue or pull request already exists and removed pending review labels Jun 10, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature resolution:duplicate This issue or pull request already exists
Projects
None yet
Development

No branches or pull requests

5 participants