Picking eligible users automatically #569

joker2411 · 2025-01-27T07:25:05Z

Description of the change

Ticket Link.

Type of change

Bug fix (non-breaking change that fixes an issue)
New feature (non-breaking change that adds functionality)
Breaking change (fix or feature that would cause existing functionality to not work as expected)

Related issues

Fix #1

Checklists

Development

Lint rules pass locally
The code changed/added as part of this pull request has been covered with tests
All tests related to the changed code pass in development

Code review

This pull request has a descriptive title and information useful to a reviewer. There may be a screenshot or screencast attached
"Ready for review" label attached to the PR and reviewers mentioned in a comment
Changes have been reviewed by at least one other engineer
Issue from task tracker has a link to this pull request

…tically

src/predictions/profiles_mlcorelib/connectors/Connector.py

src/predictions/profiles_mlcorelib/connectors/RedshiftConnector.py

src/predictions/profiles_mlcorelib/connectors/SnowflakeConnector.py

src/predictions/profiles_mlcorelib/py_native/propensity.py

src/predictions/profiles_mlcorelib/ml_core/preprocess_and_train.py

src/predictions/profiles_mlcorelib/connectors/Connector.py

dpatchigolla · 2025-01-31T02:35:53Z

src/predictions/profiles_mlcorelib/connectors/Connector.py

+
+        # For each boolean feature, trying both True and False values
+        for bool_feature in booleantype_features:
+            for bool_value in [True, False]:


Does this logic work when values of a column is 1/0, y/n etc? And in all warehouses?

This looks a bit fragile. We are making multiple assumptions here -

Features should be recognised as boolean type

The values should be True/False, not 1/0, y/n etc

We are not testing on the label column in the feature table (current default will never be picked up in this)

There can be nulls in the binary features. The null can be in eligible users flag too, or not - but ensure the label column split is correctly handled. ex - eligible_users: is_payer = 0 or is_payer is null can be a potential eligible users condition. Also, when you are running queries/converting to dataframes and checking the result, if nulls aren't explicitly checked for, the label distribution ignores the nulls completely. Make sure you consider this (ex: select is_churned, count(*) from c360 where is_payer != 1 may ignore is_payer= null condition - beware of that)

Can you rewrite this part to ensrue all these are addressed

…tically

dpatchigolla

This still seems to be in progress? Main changes around moving the core logic to trainer and changing the logic itself seems to be still in progres. I added comments only till changes so far.

dpatchigolla · 2025-02-03T12:06:04Z

src/predictions/profiles_mlcorelib/connectors/Connector.py

@@ -40,6 +41,17 @@ def validate_sql_table(self, table_name: str, entity_column: str) -> None:
                f"SQL model table {table_name} has duplicate values in entity column {entity_column}. Please make sure that the column {entity_column} in all SQL model has unique values only."
            )

+    def get_filtered_table(self, feature_table_name, filter_condition):
+        if filter_condition is None:
+            raise Exception(


Why is the Exception twice? This seems to be an error.
Can you write unit-tests for the PR? For all the new changes, lets make sure we have the tests - especially as this is a slightly complex PR.

For this specific test, we should add tests for a case where it would not have the filter_condition and we expect the exception to raise properly (This wouldn't probably cause an issue though because of two raise exceptions).

joker2411 added 3 commits January 27, 2025 12:52

selecting default eligible_users

fc97058

Merge branch 'main' into feature/prml-1061-pick-eligible-users-automa…

bb38d61

…tically

removing eligible_users from tests

1bc37e0

joker2411 requested a review from dpatchigolla January 27, 2025 12:00

joker2411 added 4 commits January 30, 2025 12:45

fetching info from sql instead of loading tables

938ceb0

correcting label_col matching issue

469617f

correction in case mismatch issue

a989ee2

Merge branch 'main' into feature/prml-1061-pick-eligible-users-automa…

a0782c3

…tically

dpatchigolla requested changes Jan 31, 2025

View reviewed changes

joker2411 added 4 commits January 31, 2025 12:45

fetching min/max proportion, buffer from constants

3177ca1

adding run_query debug logs

039ea6e

Merge branch 'main' into feature/prml-1061-pick-eligible-users-automa…

bdaa96c

…tically

adding additional logs

787bfd0

dpatchigolla reviewed Feb 3, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Picking eligible users automatically #569

Picking eligible users automatically #569

joker2411 commented Jan 27, 2025

dpatchigolla Jan 31, 2025

dpatchigolla Jan 31, 2025

dpatchigolla left a comment

dpatchigolla Feb 3, 2025

dpatchigolla Feb 3, 2025

Picking eligible users automatically #569

Are you sure you want to change the base?

Picking eligible users automatically #569

Conversation

joker2411 commented Jan 27, 2025

Description of the change

Type of change

Related issues

Checklists

Development

Code review

dpatchigolla Jan 31, 2025

Choose a reason for hiding this comment

dpatchigolla Jan 31, 2025

Choose a reason for hiding this comment

dpatchigolla left a comment

Choose a reason for hiding this comment

dpatchigolla Feb 3, 2025

Choose a reason for hiding this comment

dpatchigolla Feb 3, 2025

Choose a reason for hiding this comment