-
Notifications
You must be signed in to change notification settings - Fork 86
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Integrate multiseries into AutoMLSearch #4270
Conversation
Codecov Report
@@ Coverage Diff @@
## main #4270 +/- ##
=======================================
+ Coverage 99.7% 99.7% +0.1%
=======================================
Files 355 355
Lines 38959 39073 +114
=======================================
+ Hits 38838 38953 +115
+ Misses 121 120 -1
|
use_covariates: bool = True, | ||
use_covariates: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Flipped this default for the sake of speed. My test example did not train within a 5 minute window when use_covariates
was True, but it ran in <10 seconds when use_covariates
was False.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to see performance tests for use_covariates
turned on or off. Could possibly turn it off only for tests if performance is greatly improved with covariates!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM and agreed with @chukarsten on potentially refactoring is_multiseries
into a separate problem type!
evalml/pipelines/components/utils.py
Outdated
def get_estimators(problem_type, model_families=None, excluded_model_families=None): | ||
def _filter_multiseries_estimators(estimators, is_multiseries): | ||
if is_multiseries: | ||
return [estimator for estimator in estimators if estimator.is_multiseries] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit and maybe for a follow up: could estimator.is_multiseries
be something like estimator.supports_multiseries
? Think its more clear especially since we're passing is_multiseries
everywhere for now.
use_covariates: bool = True, | ||
use_covariates: bool = False, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it would be nice to see performance tests for use_covariates
turned on or off. Could possibly turn it off only for tests if performance is greatly improved with covariates!
* Add multiseries time series regression as problem type * Completely revamp to multiseries based on problem type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM again
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good to me
evalml/automl/automl_search.py
Outdated
@@ -651,6 +653,14 @@ def __init__( | |||
f"Dataset size is too small to create holdout set. Minimum dataset size is {self._HOLDOUT_SET_MIN_ROWS} rows, X_train has {len(X_train)} rows. Holdout set evaluation is disabled.", | |||
) | |||
|
|||
# For multiseries problems, we need to mke sure that the data is primarily ordered by the time_index rather than the series_id |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nit: we need to mke
sure -> we need to make
sure
Closes #4266