You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Sometimes, I have a dataset with many sensitive columns such as address, phone_number, etc. Usually, the entire dataset comes from a specific region of users so I want to set the locales for all of these values. The SDV 1.0 provides this functionality, but it is cumbersome.
Current Functionality: Currently I can set the locales individually on each of the transformer objects using the Anonymization Settings
fromsdv.single_tableimportGaussianCopulaSynthesizerfromrdt.transformers.piiimportAnonymizedFakersynth=GaussianCopulaSynthesizer(my_metadata)
synth.auto_assign_transformers(my_data)
# update all PII columns to using the desired locale, such as 'en_CA'synth.update_transformers(column_name_to_transformer={
'address': AnonymizedFaker(provider_name='address', function_name='address', locales=['en_CA'],
'phone_number': AnonymizedFaker(provider_name='phone_number', function_name='phone_number', locales=['en_CA'])
})
synth.fit(data)
synthetic_data=synth.sample(num_rows=100)
Expected behavior
All synthesizers (single table, multi table and sequential) should accept a global locales parameter during initialization. This should set the locales for all the relevant column transformers at once.
Note: if needed, it should still be possible to update the locales on individual columns by using auto_assign_transformers and update_transformers, as shown above.
Additional context
Internally, we should make sure that any time we assign an AnonymizedFaker, we pass in the locales specified in the synthesizer parameters.
If no locales are specified, Faker defaults to en_US. So this change does not require backwards compatibility.
The text was updated successfully, but these errors were encountered:
Problem Description
Sometimes, I have a dataset with many sensitive columns such as
address
,phone_number
, etc. Usually, the entire dataset comes from a specific region of users so I want to set the locales for all of these values. The SDV 1.0 provides this functionality, but it is cumbersome.Current Functionality: Currently I can set the locales individually on each of the transformer objects using the Anonymization Settings
Expected behavior
All synthesizers (single table, multi table and sequential) should accept a global
locales
parameter during initialization. This should set the locales for all the relevant column transformers at once.Note: if needed, it should still be possible to update the locales on individual columns by using
auto_assign_transformers
andupdate_transformers
, as shown above.Additional context
Internally, we should make sure that any time we assign an
AnonymizedFaker
, we pass in thelocales
specified in the synthesizer parameters.If no locales are specified, Faker defaults to
en_US
. So this change does not require backwards compatibility.The text was updated successfully, but these errors were encountered: