Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

UniqueCombination constraint with numerical values #434

Closed
jrmelog opened this issue May 19, 2021 · 3 comments
Closed

UniqueCombination constraint with numerical values #434

jrmelog opened this issue May 19, 2021 · 3 comments
Labels
question General question about the software

Comments

@jrmelog
Copy link

jrmelog commented May 19, 2021

Environment details

  • SDV version: 0.9.1
  • Python version: 3.8.2
  • Operating System: Windows

Problem description

Is it possible to use the UniqueCombination constraint with numerical columns on CTGAN? I tried to use it, but it shows an error: Can only use .str accessor with string values! when I try to fit the model.

What I already tried

I specified the fields types:

field_types = { "Duracao do Acordo Informada": {"type": "numerical", "subtype": "float"}, "Duracao do Acordo Efetiva": {"type": "numerical", "subtype": "float"}, "Situacao do Acordo": {"type": "numerical", "subtype": "float"} }

and then created the constraint:

acordo_constraints = UniqueCombinations(columns=['Duracao do Acordo Informada', 'Duracao do Acordo Efetiva', 'Situacao do Acordo'], handling_strategy='transform')

constraints = [acordo_constraints]

and created the model:

model = CTGAN(epochs=100, field_types = field_types, constraints = constraints)

@jrmelog jrmelog added pending review question General question about the software labels May 19, 2021
@npatki
Copy link
Contributor

npatki commented May 19, 2021

I'm able to reproduce using this test set. This limitation isn't mentioned in the docs, so I'm not sure if this is a bug in the code or docs. If it's the latter, we can turn this into a feature request.

My workaround was to recast the columns as strings, so they become categorical.

constraint = UniqueCombinations(
    columns=['name', 'age'],
    handling_strategy='transform')

model = GaussianCopula(constraints=[constraint])

# need to recast before fitting the model
table['age'] = table['age'].astype(str)

model.fit(table)

Out of curiosity, could you provide more details about your use case?

I can think many scenarios where categorical strings are UniqueCombinations, like country/city pairs. However, I can't think of any where float values need to uniquely appear. Floats can have many decimal points, so it seems rare to me that a value like 1.23456789 must appear precisely & uniquely with another value.

  1. Are you float values rounded & bounded? For eg, they only appear as full integers from 1.0-10.0
  2. Do your float values actually encode categorical entities such as a status?

@csala
Copy link
Contributor

csala commented May 20, 2021

so I'm not sure if this is a bug in the code or docs. If it's the latter, we can turn this into a feature request.

I would consider this a feature request. It was already requested on #196 too, btw, so if we agree that we are talking about the same I would flag this one as a duplicate and close, in favor of the other one.

However, I can't think of any where float values need to uniquely appear.

There could be a scenario in which there are unique combinations of products and prices. For example, think about a table that contains hotel reservations that include the columns room_type and price_per_night. One may want each room_type to always show up with the same price_per_night, which should be seen as a given "constant".

@npatki
Copy link
Contributor

npatki commented May 24, 2021

Agreed. Closing out this issue in favor of the older feature request.

@jrmelog, If would be great if you could provide more detail about your use case in #196. That way, we can make sure we factor in whatever nuances we find when we provide a fix.

@npatki npatki closed this as completed May 24, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question about the software
Projects
None yet
Development

No branches or pull requests

3 participants