-
Notifications
You must be signed in to change notification settings - Fork 24
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Align test/id sdtypes to match SDV
#881
Align test/id sdtypes to match SDV
#881
Conversation
Codecov ReportAll modified and coverable lines are covered by tests ✅
❗ Your organization needs to install the Codecov GitHub app to enable full functionality. Additional details and impacted files@@ Coverage Diff @@
## main #881 +/- ##
==========================================
Coverage 100.00% 100.00%
==========================================
Files 18 18
Lines 1844 2215 +371
==========================================
+ Hits 1844 2215 +371 ☔ View full report in Codecov by Sentry. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good! Just one suggestion
@@ -1,4 +1,4 @@ | |||
"""Dataset Generators for Text transformers.""" | |||
"""Dataset Generators for 'text' transformers.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we just delete this file now since we have the id one testing the exact same thing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I left it so we can test the text
input still works as sdtype.
rdt/performance/datasets/id.py
Outdated
|
||
@staticmethod | ||
def get_performance_thresholds(): | ||
"""Return the expected threseholds.""" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Return the expected threseholds.""" | |
"""Return the expected thresholds.""" |
May want to Ctrl+F the library for this typo
"Importing 'IDGenerator' or 'RegexGenerator' for ID columns from 'rdt.transformers.text' " | ||
"is deprecated. Please use 'rdt.transformers.id' instead.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This warning mentions importing from transformers.id
instead of transformers.text
but should it also mention switching the sdtype from text to id?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure, cc: @npatki
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry I revisited the issue and we would like to raise a warning if the sdtype
is set as text
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Addressed here 2a8d014
rdt/transformers/utils.py
Outdated
"""Custom dictionary to raise a deprecation warning.""" | ||
|
||
def get(self, key): | ||
"""Retrun the value for key if key is in the dictionary, else default. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
"""Retrun the value for key if key is in the dictionary, else default. | |
"""Return the value for key if key is in the dictionary, else default. |
'text': RegexGenerator(), | ||
'pii': AnonymizedFaker(), | ||
} | ||
DEFAULT_TRANSFORMERS = WarnDict( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this cause the warning to be raised more than once in one workflow? The cases I'm wondering are
- If the user has multiple columns set to an sdtype of text
- If the overall process for the HyperTransformer accesses this dict more than once
We probably want to make sure the user doesn't get spammed with this warning
Resolves #880
CU-86b2378fg