-
Notifications
You must be signed in to change notification settings - Fork 55
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
fix(core): simplified evaluate & report methods #29
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good 🚀
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
report in evaluate, pretty 🚀 ➡️ 🌞
fix(readme): updated quickstart to new flow docs(readme): added get_warnings example fix(docs): fixed docstring on engine evaluate
21ec488
to
49ee925
Compare
@@ -85,6 +86,8 @@ def evaluate(self, df: pd.DataFrame, dtypes: Optional[dict] = None, label: str=N | |||
if label: | |||
results['Feature Importance'] = self._feature_importance(corr_mat, p_corr_mat, label, corr_th) | |||
results['High Collinearity'] = self._high_collinearity_detection(df, self.dtypes, label, vif_th, p_th=p_th) | |||
if summary: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
clean_warnings here too, now I am thinking why dont we clean warnings just at the report methods?
I also realize it seems we dont yet have a good solution for the single executed tests that can stack duplicate warnings and never get cleaned if the warning list is consumed straight from the property
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
warnings can also be consumed with get_warnings() and those too should be clean
@@ -98,11 +98,11 @@ def flatlines(self, th: int=5, skip: list=[]): | |||
self.store_warning( | |||
QualityWarning( | |||
test='Flatlines', category='Erroneous Data', priority=2, data=flatlines, | |||
description=f"Found {total_flatlines} flatline events with a minimun length of {th} among the columns {set(flatlines.keys())}." | |||
description=f"Found {total_flatlines} flatline events with a minimun length of {th:.0f} among the columns {set(flatlines.keys())}." |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Float formatting seems wrong here, this threshold is an integer, shouldn't it be just {th} ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
just {th} was printing a float with a lot of decimals (e.g. 5.00000) . the expression :0.f
is equivalent to an int
)) | ||
return flatlines | ||
else: | ||
self._logger.info("No flatline events with a minimum length of %f were found.", th) | ||
self._logger.info(f"No flatline events with a minimum length of {th:.0f} were found.") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this string formatting probably misled you, should be %d
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Small remarks, check before merging, otherwise here is my approval and 👍🏼
Closes #36 |
On top of #27