You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
In version 0.4.0, the dictionaries of value equivalences used for data tabular anonymization are stored within the Table metadata, which can lead to a disclosure of the original values if the model is saved in a pickle file and shipped to the synthetic data recipients.
In order to fix this, the anonymization mappings should be stored in a dictionary that is stored somewhere outside from the Table instance, so it is erased and lost once the Python process in which the training process took place ends.
Steps to reproduce
In [1]: from sdv.demo import load_tabular_demo
In [2]: demo = load_tabular_demo()
In [3]: from sdv.tabular import GaussianCopula
In [4]: model = GaussianCopula(anonymize_fields={'name': 'name'})
In [5]: model.fit(demo)
In [6]: model.save('model.pkl')
In [7]: loaded_model = GaussianCopula.load('model.pkl')
In [8]: metadata = loaded_model.get_metadata()
In [9]: metadata._anonymization_mappings
Out[9]:
{'name': {'Dr. Tammy White': 'Philip Gould',
'Susan Brock DDS': 'Alexandra Long',
'Dr. Mary Warren': 'Ronald Cox',
'Kristine Garner': 'Wendy Sharp',
'Eric Clark': 'Sharon Smith',
'Ariel Peterson': 'Dr. Jerry Anderson',
'Terry Vargas': 'Mary Payne',
'Ethan Palmer': 'Ashley Carter',
'Steven Evans': 'Kelsey Jimenez',
'Jesse Freeman MD': 'Melanie Meyers',
'Judith Garcia': 'Jessica Olsen',
'Cindy Hendricks': 'William Martinez'}}
The text was updated successfully, but these errors were encountered:
Description
In version 0.4.0, the dictionaries of value equivalences used for data tabular anonymization are stored within the
Table
metadata, which can lead to a disclosure of the original values if the model is saved in a pickle file and shipped to the synthetic data recipients.In order to fix this, the anonymization mappings should be stored in a dictionary that is stored somewhere outside from the
Table
instance, so it is erased and lost once the Python process in which the training process took place ends.Steps to reproduce
The text was updated successfully, but these errors were encountered: