Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[dtypes] FloatFormatter reverse transform does not support new pandas dtypes #855

Closed
pvk-developer opened this issue Jul 31, 2024 · 0 comments · Fixed by #857
Closed

[dtypes] FloatFormatter reverse transform does not support new pandas dtypes #855

pvk-developer opened this issue Jul 31, 2024 · 0 comments · Fixed by #857
Assignees
Labels
bug Something isn't working internal The issue doesn't change the API or functionality
Milestone

Comments

@pvk-developer
Copy link
Member

pvk-developer commented Jul 31, 2024

Error Description

The FloatFormatter crashes due to hard coded logic attempting to cast to a numpy data type (line 191: is_integer = np.dtype(self._dtype).kind == 'i'). If the dtype differs from np.dtype, this causes a breaking error during reverse transformation. Which is the root cause of FloatFormatter not handling the new pandas data types.

Steps to reproduce

from rdt.transformers import FloatFormatter
import pandas as pd
data = {
    'Int8': pd.Series([1, 2, -3], dtype='Int8'),
    'Int16': pd.Series([1, 2, -3], dtype='Int16'),
    'Int32': pd.Series([1, 2, -3], dtype='Int32'),
    'Int64': pd.Series([1, 2, -3], dtype='Int64'),
    'Float32': pd.Series([1.1, 2.2, 3.3], dtype='Float32'),
    'Float64': pd.Series([1.1, 2.2, 3.3], dtype='Float64'),
}
df = pd.DataFrame(data)

ff = FloatFormatter()
ff.fit(df, 'Int8')
transformed = ff.transform(df)
ff.reverse_transform(transformed)

Expected behavior

  • We should support this new dtypes. The best approach for this is to use is_integer_dtype from pandas. This also supports numpy.dtypes and uints.
  • Add an integration test to make sure that we support this new dtypes with Null values.

Additional Context

Once this is fixed, we should be able to fit and sample from SDV (it is important to confirm sampling since fitting is already confirmed).

@pvk-developer pvk-developer added bug Something isn't working internal The issue doesn't change the API or functionality labels Jul 31, 2024
@R-Palazzo R-Palazzo self-assigned this Aug 1, 2024
@R-Palazzo R-Palazzo added this to the 1.12.3 milestone Aug 1, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working internal The issue doesn't change the API or functionality
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants