-
Notifications
You must be signed in to change notification settings - Fork 320
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AttributeError on UniqueCombinations constraint with non-strings #196
Comments
Thanks for reporting this @LihuaXiong2020 I think that the problem is not really the categorical data in general but just the categorical data made of integer values, so the title might be a bit misleading. Would you mind editing the title to something like: "AttributeError when using UniqueCombinations constraint with integer values"? It would also be helpful if you could post a short snippet of code showing how to reproduce the error. |
Hi @csala, I think it's not just for integers, cuz I transformed the integers in to categoricals and it appears UniqueCombinations can only work with strings. Would it be possible to extend it to cover other dtypes? Sure, I'll try to construct a reproducible example. |
Oh, yes, I actually meant this: Values that are not strings, independently on the type that they have in the metadata. This will be a tricky one, because even if we convert the values into strings on the fly inside the constraint, if we have mixed types it will be hard to keep track of what the original type was. For example, if we have a column that contains two categories with different dtypes, like |
Hi @csala, I reproduced in python 3.7 but it's the same as python 3.6.8. The specification of the Categorical type through metadata is also omitted, as it's the same case with pure integers. |
Description & What I did
Reproduce
`
import pandas as pd
from sdv.constraints import UniqueCombinations
from sdv.tabular import GaussianCopula
df = pd.DataFrame({"cat_a": [1,2,3], "cat_b": [4,5,6], "value": [0.5, 1.0, 1.5]})
unique_comb_segments = UniqueCombinations(
columns=[
"cat_a",
"cat_b"
],
handling_strategy="transform"
)
model = GaussianCopula(constraints=[unique_comb_segments])
model.fit(df)
`
Error:
`
AttributeError Traceback (most recent call last)
in
8 )
9 model = GaussianCopula(constraints=[unique_comb_segments])
---> 10 model.fit(df)
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/tabular/base.py in fit(self, data)
100 """
101 if not self._metadata_fitted:
--> 102 self._metadata.fit(data)
103
104 self._num_rows = len(data)
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/metadata/table.py in fit(self, data)
446 data = self._anonymize(data)
447
--> 448 data = self._fit_transform_constraints(data)
449 self._fit_hyper_transformer(data)
450 self.fitted = True
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/metadata/table.py in _fit_transform_constraints(self, data)
330 self._constraints[idx] = constraint
331
--> 332 data = constraint.fit_transform(data)
333
334 return data
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/constraints/base.py in fit_transform(self, table_data)
124 Transformed data.
125 """
--> 126 self.fit(table_data)
127 return self.transform(table_data)
128
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/constraints/tabular.py in fit(self, table_data)
119 """
120 self._separator = '#'
--> 121 while not self._valid_separator(table_data):
122 self._separator += '#'
123
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/sdv/constraints/tabular.py in _valid_separator(self, table_data)
96 """
97 for column in self._columns:
---> 98 if table_data[column].str.contains(self._separator).any():
99 return False
100
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/generic.py in getattr(self, name)
5173 or name in self._accessors
5174 ):
-> 5175 return object.getattribute(self, name)
5176 else:
5177 if self._info_axis._can_hold_identifiers_and_holds_name(name):
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/accessor.py in get(self, obj, cls)
173 # we're accessing the attribute of the class, i.e., Dataset.geo
174 return self._accessor
--> 175 accessor_obj = self._accessor(obj)
176 # Replace the property with the accessor object. Inspired by:
177 # http://www.pydanny.com/cached-property.html
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/strings.py in init(self, data)
1915
1916 def init(self, data):
-> 1917 self._inferred_dtype = self._validate(data)
1918 self._is_categorical = is_categorical_dtype(data)
1919
~/opt/anaconda3/envs/python3b/lib/python3.7/site-packages/pandas/core/strings.py in _validate(data)
1965
1966 if inferred_dtype not in allowed_types:
-> 1967 raise AttributeError("Can only use .str accessor with string " "values!")
1968 return inferred_dtype
1969
AttributeError: Can only use .str accessor with string values!
`
The text was updated successfully, but these errors were encountered: