Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Certain sdtypes cause Faker to raise error #1346

Closed
amontanez24 opened this issue Mar 30, 2023 · 0 comments · Fixed by #1359 or sdv-dev/RDT#630
Closed

Certain sdtypes cause Faker to raise error #1346

amontanez24 opened this issue Mar 30, 2023 · 0 comments · Fixed by #1359 or sdv-dev/RDT#630
Assignees
Labels
bug Something isn't working
Milestone

Comments

@amontanez24
Copy link
Contributor

Environment Details

Please indicate the following details about the environment in which you found the bug:

  • SDV version: 1.0.0
  • Python version: Any
  • Operating System: Any

Error Description

If certain sdtypes like state are provided for a column, then SDV passing the incorrect provider name to RDDT, causing it to crash with this error:

TransformerProcessingError: The 'en_US' module does not contain a function named 'state'.
Refer to the Faker docs to find the correct function: https://faker.readthedocs.io/en/master/providers.html

Steps to reproduce

import pandas as pd
from sdv.metadata import SingleTableMetadata
from sdv.single_table import GaussianCopulaSynthesizer

data = pd.DataFrame({
    'id': [1, 2, 3],
    'state': ['California', 'New York', 'Texas'],
    'salary': [10000, 50000, 120000]
})

metadata = SingleTableMetadata()
metadata.detect_from_dataframe(data)
metadata.update_column(column_name='state', sdtype='state')

synth = GaussianCopulaSynthesizer(metadata)
synth.fit(data)
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
2 participants