Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Picking eligible users automatically #569

Open
wants to merge 11 commits into
base: main
Choose a base branch
from

Conversation

joker2411
Copy link
Collaborator

Description of the change

Ticket Link.

Type of change

  • Bug fix (non-breaking change that fixes an issue)
  • New feature (non-breaking change that adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to not work as expected)

Related issues

Fix #1

Checklists

Development

  • Lint rules pass locally
  • The code changed/added as part of this pull request has been covered with tests
  • All tests related to the changed code pass in development

Code review

  • This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached
  • "Ready for review" label attached to the PR and reviewers mentioned in a comment
  • Changes have been reviewed by at least one other engineer
  • Issue from task tracker has a link to this pull request

src/predictions/profiles_mlcorelib/connectors/Connector.py Outdated Show resolved Hide resolved
src/predictions/profiles_mlcorelib/py_native/propensity.py Outdated Show resolved Hide resolved
src/predictions/profiles_mlcorelib/py_native/propensity.py Outdated Show resolved Hide resolved

# For each boolean feature, trying both True and False values
for bool_feature in booleantype_features:
for bool_value in [True, False]:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this logic work when values of a column is 1/0, y/n etc? And in all warehouses?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks a bit fragile. We are making multiple assumptions here -

  1. Features should be recognised as boolean type
  2. The values should be True/False, not 1/0, y/n etc
  3. We are not testing on the label column in the feature table (current default will never be picked up in this)
  4. There can be nulls in the binary features. The null can be in eligible users flag too, or not - but ensure the label column split is correctly handled. ex - eligible_users: is_payer = 0 or is_payer is null can be a potential eligible users condition. Also, when you are running queries/converting to dataframes and checking the result, if nulls aren't explicitly checked for, the label distribution ignores the nulls completely. Make sure you consider this (ex: select is_churned, count(*) from c360 where is_payer != 1 may ignore is_payer= null condition - beware of that)

Can you rewrite this part to ensrue all these are addressed

Copy link
Collaborator

@dpatchigolla dpatchigolla left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This still seems to be in progress? Main changes around moving the core logic to trainer and changing the logic itself seems to be still in progres. I added comments only till changes so far.

@@ -40,6 +41,17 @@ def validate_sql_table(self, table_name: str, entity_column: str) -> None:
f"SQL model table {table_name} has duplicate values in entity column {entity_column}. Please make sure that the column {entity_column} in all SQL model has unique values only."
)

def get_filtered_table(self, feature_table_name, filter_condition):
if filter_condition is None:
raise Exception(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is the Exception twice? This seems to be an error.
Can you write unit-tests for the PR? For all the new changes, lets make sure we have the tests - especially as this is a slightly complex PR.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For this specific test, we should add tests for a case where it would not have the filter_condition and we expect the exception to raise properly (This wouldn't probably cause an issue though because of two raise exceptions).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants