Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Don't modify my metadata object #754

Closed
npatki opened this issue Mar 31, 2022 · 0 comments · Fixed by #757
Closed

Don't modify my metadata object #754

npatki opened this issue Mar 31, 2022 · 0 comments · Fixed by #757
Assignees
Labels
feature request Request for a new feature
Milestone

Comments

@npatki
Copy link
Contributor

npatki commented Mar 31, 2022

Environment Details

  • SDV version: 0.14.0 (and previous)

Description

Right now, if I pass in a metadata.Table object into a model and fit it, the process will modify my metadata (add extra fields to it).

This can lead to unexpected behavior due to the extra fields that were added, especially if I want to re-use the metadata object in a different model.

Can we make a deep copy of the metadata.Table object instead?

Steps to reproduce

See the code below for unexpected usage

from sdv.tabular import GaussianCopula
from sdv.demo import load_tabular_demo

metadata, data = load_tabular_demo('student_placements', metadata=True)

# create a model that estimates everything as a gamma distribution
model = GaussianCopula(table_metadata=metadata,
                       categorical_transformer='label_encoding',
                       default_distribution='gamma')
model.fit(data)
print(model.get_distributions()) # each distribution correctly is a Gamma

# create a new model that estimates everything as a beta
model2 = GaussianCopula(table_metadata=metadata,
                        categorical_transformer='label_encoding',
                        default_distribution='beta')

model2.fit(data)
print(model2.get_distributions()) # Everything is still gamma. This is unexpected!

This is happening because, unbeknownst to me, my metadata is modified to include a 'gamma' distribution field for each and every column. I am unknowingly passing it into model2, which is favoring the metadata over the default_distribution parameter.

@npatki npatki added the feature request Request for a new feature label Mar 31, 2022
@katxiao katxiao added this to the 0.14.1 milestone May 3, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
feature request Request for a new feature
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants