Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Align test/id sdtypes to match SDV #881

Merged
merged 9 commits into from
Sep 17, 2024

Conversation

pvk-developer
Copy link
Member

Resolves #880
CU-86b2378fg

@pvk-developer pvk-developer requested a review from a team as a code owner September 11, 2024 12:45
@pvk-developer pvk-developer requested review from rwedge and removed request for a team September 11, 2024 12:45
@sdv-team
Copy link
Contributor

@codecov-commenter
Copy link

⚠️ Please install the 'codecov app svg image' to ensure uploads and comments are reliably processed by Codecov.

Codecov Report

All modified and coverable lines are covered by tests ✅

Project coverage is 100.00%. Comparing base (3738f9e) to head (a00cbf0).
Report is 164 commits behind head on main.

❗ Your organization needs to install the Codecov GitHub app to enable full functionality.

Additional details and impacted files
@@            Coverage Diff             @@
##              main      #881    +/-   ##
==========================================
  Coverage   100.00%   100.00%            
==========================================
  Files           18        18            
  Lines         1844      2215   +371     
==========================================
+ Hits          1844      2215   +371     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

Copy link
Contributor

@amontanez24 amontanez24 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just one suggestion

@@ -1,4 +1,4 @@
"""Dataset Generators for Text transformers."""
"""Dataset Generators for 'text' transformers."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we just delete this file now since we have the id one testing the exact same thing?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I left it so we can test the text input still works as sdtype.


@staticmethod
def get_performance_thresholds():
"""Return the expected threseholds."""
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Return the expected threseholds."""
"""Return the expected thresholds."""

May want to Ctrl+F the library for this typo

Comment on lines +8 to +9
"Importing 'IDGenerator' or 'RegexGenerator' for ID columns from 'rdt.transformers.text' "
"is deprecated. Please use 'rdt.transformers.id' instead.",
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This warning mentions importing from transformers.id instead of transformers.text but should it also mention switching the sdtype from text to id?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, cc: @npatki

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry I revisited the issue and we would like to raise a warning if the sdtype is set as text.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Addressed here 2a8d014

"""Custom dictionary to raise a deprecation warning."""

def get(self, key):
"""Retrun the value for key if key is in the dictionary, else default.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Retrun the value for key if key is in the dictionary, else default.
"""Return the value for key if key is in the dictionary, else default.

'text': RegexGenerator(),
'pii': AnonymizedFaker(),
}
DEFAULT_TRANSFORMERS = WarnDict(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this cause the warning to be raised more than once in one workflow? The cases I'm wondering are

  1. If the user has multiple columns set to an sdtype of text
  2. If the overall process for the HyperTransformer accesses this dict more than once
    We probably want to make sure the user doesn't get spammed with this warning

rdt/transformers/utils.py Outdated Show resolved Hide resolved
rdt/transformers/utils.py Outdated Show resolved Hide resolved
@pvk-developer pvk-developer merged commit 7c88e3e into main Sep 17, 2024
47 checks passed
@pvk-developer pvk-developer deleted the issue-880-align-text-id-sdtypes-to-sdv-library branch September 17, 2024 16:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Align text/id sdtypes to the SDV library
5 participants