Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Integrate multiseries into AutoMLSearch #4270

Merged
merged 15 commits into from
Aug 21, 2023
Merged

Integrate multiseries into AutoMLSearch #4270

merged 15 commits into from
Aug 21, 2023

Conversation

eccabay
Copy link
Contributor

@eccabay eccabay commented Aug 11, 2023

Closes #4266

@codecov
Copy link

codecov bot commented Aug 11, 2023

Codecov Report

Merging #4270 (193ef57) into main (24ba211) will increase coverage by 0.1%.
The diff coverage is 100.0%.

@@           Coverage Diff           @@
##            main   #4270     +/-   ##
=======================================
+ Coverage   99.7%   99.7%   +0.1%     
=======================================
  Files        355     355             
  Lines      38959   39073    +114     
=======================================
+ Hits       38838   38953    +115     
+ Misses       121     120      -1     
Files Changed Coverage Δ
evalml/pipelines/components/component_base.py 100.0% <ø> (ø)
...sors/multiseries_time_series_baseline_regressor.py 100.0% <ø> (ø)
evalml/problem_types/__init__.py 100.0% <ø> (ø)
evalml/tests/component_tests/test_utils.py 99.3% <ø> (ø)
evalml/tests/conftest.py 98.4% <ø> (ø)
...lml/tests/problem_type_tests/test_problem_types.py 100.0% <ø> (ø)
evalml/utils/gen_utils.py 99.3% <ø> (ø)
...valml/automl/automl_algorithm/default_algorithm.py 99.7% <100.0%> (+0.1%) ⬆️
evalml/automl/automl_search.py 99.8% <100.0%> (+0.1%) ⬆️
evalml/automl/utils.py 97.3% <100.0%> (+0.1%) ⬆️
... and 20 more

@eccabay eccabay marked this pull request as ready for review August 15, 2023 14:12
use_covariates: bool = True,
use_covariates: bool = False,
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Flipped this default for the sake of speed. My test example did not train within a 5 minute window when use_covariates was True, but it ran in <10 seconds when use_covariates was False.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to see performance tests for use_covariates turned on or off. Could possibly turn it off only for tests if performance is greatly improved with covariates!

Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM and agreed with @chukarsten on potentially refactoring is_multiseries into a separate problem type!

def get_estimators(problem_type, model_families=None, excluded_model_families=None):
def _filter_multiseries_estimators(estimators, is_multiseries):
if is_multiseries:
return [estimator for estimator in estimators if estimator.is_multiseries]
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit and maybe for a follow up: could estimator.is_multiseries be something like estimator.supports_multiseries? Think its more clear especially since we're passing is_multiseries everywhere for now.

use_covariates: bool = True,
use_covariates: bool = False,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be nice to see performance tests for use_covariates turned on or off. Could possibly turn it off only for tests if performance is greatly improved with covariates!

* Add multiseries time series regression as problem type

* Completely revamp to multiseries based on problem type
Copy link
Collaborator

@jeremyliweishih jeremyliweishih left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM again

Copy link
Contributor

@MichaelFu512 MichaelFu512 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me

@@ -651,6 +653,14 @@ def __init__(
f"Dataset size is too small to create holdout set. Minimum dataset size is {self._HOLDOUT_SET_MIN_ROWS} rows, X_train has {len(X_train)} rows. Holdout set evaluation is disabled.",
)

# For multiseries problems, we need to mke sure that the data is primarily ordered by the time_index rather than the series_id
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: we need to mke sure -> we need to make sure

@eccabay eccabay merged commit 7781c77 into main Aug 21, 2023
24 checks passed
@eccabay eccabay deleted the 4266_msts_search branch August 21, 2023 14:36
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Integrate MSTS into AutoMLSearch (EvalML)
3 participants