Add ChnPiiGenerator and Enhance Models #191
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Description
This pull request includes a series of updates and enhancements across multiple files in the software project. Key changes include:
base.py
: Added exception handling for column removal in tabular data processing to prevent unintended consequences.chn_pii.py
: Introduced a new ChnPiiGenerator class for handling Chinese personal identifiable information (PII) data, including fitting, converting, and reverse converting processes.manager.py
: Updated the list of default processors to include the new ChnPiiGenerator.base.py
: Added a new boolean attribute fit_data_empty to the MLSynthesizerModel class.ctgan.py
: Improved the handling of discrete columns during model fitting and added checks for empty data frames.test_chn_pii_generator.py
: Added comprehensive tests for the new ChnPiiGenerator class to ensure its functionality and robustness.Motivation and Context
These changes are necessary to enhance the robustness and functionality of the data processing and model fitting components of the sdgx.
The introduction of the
ChnPiiGenerator
is particularly important for handling specific regional PII data.How has this been tested?
The changes have been thoroughly tested in a local development environment.
The new
ChnPiiGenerator
has been tested with various scenarios, including edge cases, to ensure it handles data correctly.Types of changes
Checklist: