-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Picking eligible users automatically #569
base: main
Are you sure you want to change the base?
Picking eligible users automatically #569
Conversation
src/predictions/profiles_mlcorelib/connectors/RedshiftConnector.py
Outdated
Show resolved
Hide resolved
src/predictions/profiles_mlcorelib/ml_core/preprocess_and_train.py
Outdated
Show resolved
Hide resolved
src/predictions/profiles_mlcorelib/ml_core/preprocess_and_train.py
Outdated
Show resolved
Hide resolved
|
||
# For each boolean feature, trying both True and False values | ||
for bool_feature in booleantype_features: | ||
for bool_value in [True, False]: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does this logic work when values of a column is 1/0, y/n etc? And in all warehouses?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This looks a bit fragile. We are making multiple assumptions here -
- Features should be recognised as boolean type
- The values should be True/False, not 1/0, y/n etc
- We are not testing on the label column in the feature table (current default will never be picked up in this)
- There can be nulls in the binary features. The null can be in eligible users flag too, or not - but ensure the label column split is correctly handled. ex -
eligible_users: is_payer = 0 or is_payer is null
can be a potential eligible users condition. Also, when you are running queries/converting to dataframes and checking the result, if nulls aren't explicitly checked for, the label distribution ignores the nulls completely. Make sure you consider this (ex: select is_churned, count(*) from c360 where is_payer != 1 may ignore is_payer= null condition - beware of that)
Can you rewrite this part to ensrue all these are addressed
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This still seems to be in progress? Main changes around moving the core logic to trainer and changing the logic itself seems to be still in progres. I added comments only till changes so far.
@@ -40,6 +41,17 @@ def validate_sql_table(self, table_name: str, entity_column: str) -> None: | |||
f"SQL model table {table_name} has duplicate values in entity column {entity_column}. Please make sure that the column {entity_column} in all SQL model has unique values only." | |||
) | |||
|
|||
def get_filtered_table(self, feature_table_name, filter_condition): | |||
if filter_condition is None: | |||
raise Exception( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why is the Exception twice? This seems to be an error.
Can you write unit-tests for the PR? For all the new changes, lets make sure we have the tests - especially as this is a slightly complex PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For this specific test, we should add tests for a case where it would not have the filter_condition and we expect the exception to raise properly (This wouldn't probably cause an issue though because of two raise exceptions).
Description of the change
Type of change
Related issues
Checklists
Development
Code review