Factor out time series-related functionality into a time series Task object #989

EgorKraevTransferwise · 2023-04-10T13:49:46Z

Why are these changes needed?

This continues the process that began with the task-based-refactor PR, now merged, Time-series related models are moved into a subpackage, related bits in the Task object are moved into a separate Task. In the process, some existing bugs are fixed (such as hcrystalball wrapper with certain configs only supporting forecasting for as far ahead as the length of the validation set).
Also, input data is auto-enriched with features derived from the timestamps.

Related issue number

Checks

I've used pre-commit to lint the changes in this PR, or I've made sure lint with flake8 output is two 0s.
I've included any doc changes needed for https://microsoft.github.io/FLAML/. See https://microsoft.github.io/FLAML/docs/Contribute#documentation to build and test documentation locally.
[x ] I've added tests (if relevant) corresponding to the changes introduced in this PR.
[ x] I've made sure all auto checks have passed.

Moved some of the packages into an automl subpackage to tidy before the task-based refactor. This is in response to discussions with the group and a comment on the first task-based PR. Only changes here are moving subpackages and modules into the new automl, fixing imports to work with this structure and fixing some dependencies in setup.py.

I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was.

flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later.

…-for-automl

Got to the point where the methods from AutoML are pulled to GenericTask. Started removing private markers and removing the passing of automl to these methods. Done with decide_split_type, started on prepare_data. Need to do the others after

…from-automl

…redict()

…into multivariate_target # Conflicts: # flaml/automl/model.py # flaml/default/suggest.py

flaml/automl/time_series/ts_model.py

sonichi

Is the ts_forecast notebook tested with this PR?

sonichi · 2023-06-06T20:55:28Z

flaml/automl/task/time_series_task.py

+ return val_loss, metric, train_time, pred_time
+
+ def default_estimator_list(self, estimator_list: List[str], is_spark_dataframe: bool) -> List[str]:
+ assert not is_spark_dataframe, "Spark is not yet supported for time series"


Spark -> Spark dataframe

sonichi · 2023-06-06T20:56:56Z

test/test_model.py

- print(lgbm.feature_names_in_)
- print(lgbm.feature_importances_)


Are these two properties covered by any other test? If not, do not remove them.

Why? Why the special treatment for just these properties of just this estimator?

Most FLAML estimators have these properties. Here only one estimator is tested. Better than no test.

sonichi · 2023-06-06T20:57:42Z

test/test_model.py

- )
- y = np.array([0, 1, 0, 1, 0, 0])
- lgbm.predict(X[:2])
- lgbm.fit(X, y, period=2)


Is this use case (period=2) covered by other test? If not or unsure, do not remove it.

Why would we want to support this usecase, with repeating dates? As phrased above, this is not a time series problem, the same date repeats three times with different labels, and the dates are irregularly spaced.

The repeating dates are for the deduplication feature I suppose. Not sure about the irregular space. @int-chaos Could you comment?

ZmeiGorynych · 2023-06-07T08:27:44Z

The test fails come from merging main, have nothing to do with this PR as best I can tell

…into time-series-task

sonichi · 2023-06-13T13:25:34Z

flaml/automl/task/time_series_task.py

+ logger.warning("Duplicate timestamp values found in timestamp column. " f"\n{X.loc[duplicates, X][time_col]}")
+ X = X.drop_duplicates()
+ logger.warning("Removed duplicate rows based on all columns")
+ assert (
+ X[[X.columns[0]]].duplicated() is None
+ ), "Duplicate timestamp values with different values for other columns."


The coverage of this part of the code gets removed by the change in the test code.

test/test_model.py

cover dedup

markharley and others added 30 commits November 14, 2022 19:44

Fix doc building post automl subpackage refactor

eb7aac9

Fix broken links in website post automl subpackage refactor

0eda959

Fix broken links in website post automl subpackage refactor

c3a567c

Remove vw from test deps as this is breaking the build

f148cac

Move default back to the top-level

7ce03a9

I'd moved this to automl as that's where it's used internally, but had missed that this is actually part of the public interface so makes sense to live where it was.

Re-add top level modules with deprecation warnings

739b256

flaml.data, flaml.ml and flaml.model are re-added to the top level, being re-exported from flaml.automl for backwards compatability. Adding a deprecation warning so that we can have a planned removal later.

Merge branch 'main' into subpackage-refactor-for-automl

4c008e8

Merge branch 'main' into subpackage-refactor-for-automl

e7c8f91

Merge branch 'main' into subpackage-refactor-for-automl

f845df8

Merge remote-tracking branch 'upstream/main' into subpackage-refactor…

36ffb1e

…-for-automl

Merge microsoft/main into here

2386989

Merge remote-tracking branch 'upstream/main' into subpackage-refactor…

3c53eca

…-for-automl

Fix model.py line-endings

d747851

WIP

3a6b95b

WIP - Notes below

1e51966

Got to the point where the methods from AutoML are pulled to GenericTask. Started removing private markers and removing the passing of automl to these methods. Done with decide_split_type, started on prepare_data. Need to do the others after

Merge remote-tracking branch 'upstream/main' into extract-task-class-…

938e3c9

…from-automl

Re-add generic_task

1cc1f1d

Merge remote-tracking branch 'upstream/main' into extract-task-class-…

9ca5f18

…from-automl

Most of the merge done, test_forecast_automl fit succeeds, fails at p…

8e6a73f

…redict()

Remaining fixes - test_forecast.py passes

413dbe1

Comment out holidays-related code as it's not currently used

4108eb1

Further holidays cleanup

9eb62cb

Fix imports in a test

1472a55

tidy up validate_data in time series task

dde26d5

Test fixes

3aff2c3

Fix tests: add Task.__str__

5a0694b

Fix tests: test for ray.ObjectRef

b5b6cc8

Hotwire TS_Sklearn wrapper to fix test fail

dfcca3b

Merge remote-tracking branch 'origin/extract-task-class-from-automl' …

143205c

…into multivariate_target # Conflicts: # flaml/automl/model.py # flaml/default/suggest.py

EgorKraevTransferwise added 14 commits May 28, 2023 13:49

Try to fix pandas import fail, again

736ed40

Try to fix pandas import fail, again

731757d

Try to fix pandas import fail, again

8fd1235

Try to fix pandas import fail, again

27c80fb

Try to fix pandas import fail, again

2213a18

Try to fix pandas import fail, again

fd427ec

Try to fix pandas import fail, again

b9a7e28

Formatting fixes

6fd290b

More formatting fixes

a288317

Added test that loops over TS models to ensure coverage

fa62ac6

Fix formatting issues

edd0198

Fix more formatting issues

5d3c109

Merge remote-tracking branch 'origin/main' into time-series-task

21368d1

Fix random fail in check

d1c543e

sonichi reviewed Jun 2, 2023

View reviewed changes

flaml/automl/time_series/ts_model.py Show resolved Hide resolved

EgorKraevTransferwise and others added 2 commits June 5, 2023 16:18

Put back in tests for ARIMA predict without fit

10b851d

Merge branch 'main' into time-series-task

1c5d6ad

sonichi reviewed Jun 6, 2023

View reviewed changes

EgorKraevTransferwise and others added 3 commits June 13, 2023 09:32

Put back in tests for lgbm

f5dc200

Merge branch 'time-series-task' of https://github.com/markharley/FLAML …

ff50f14

…into time-series-task

Merge branch 'main' into time-series-task

b7c3ce8

sonichi reviewed Jun 13, 2023

View reviewed changes

test/test_model.py Show resolved Hide resolved

Update test/test_model.py

39cf3e6

cover dedup

sonichi enabled auto-merge June 13, 2023 13:40

Match target length to X length in missing test

f6fdbbf

sonichi added this pull request to the merge queue Jun 19, 2023

Merged via the queue into microsoft:main with commit 5245efb Jun 19, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Factor out time series-related functionality into a time series Task object #989

Factor out time series-related functionality into a time series Task object #989

EgorKraevTransferwise commented Apr 10, 2023 •

edited by yiranwu0

Loading

sonichi left a comment

sonichi Jun 6, 2023

sonichi Jun 6, 2023

EgorKraevTransferwise Jun 12, 2023

sonichi Jun 12, 2023

sonichi Jun 6, 2023

EgorKraevTransferwise Jun 12, 2023

sonichi Jun 12, 2023

ZmeiGorynych commented Jun 7, 2023

sonichi Jun 13, 2023

		print(lgbm.feature_names_in_)
		print(lgbm.feature_importances_)

Factor out time series-related functionality into a time series Task object #989

Factor out time series-related functionality into a time series Task object #989

Conversation

EgorKraevTransferwise commented Apr 10, 2023 • edited by yiranwu0 Loading

Why are these changes needed?

Related issue number

Checks

sonichi left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ZmeiGorynych commented Jun 7, 2023

Choose a reason for hiding this comment

EgorKraevTransferwise commented Apr 10, 2023 •

edited by yiranwu0

Loading