[ADD] fit pipeline honoring API constraints with tests #348

ravinkohli · 2021-11-30T15:07:08Z

Types of changes

Bug fix (non-breaking change which fixes an issue)

Note that a Pull Request should only contain one of refactoring, new features or documentation changes.
Please separate these changes and send us individual PRs for each.
For more information on how to create a good pull request, please refer to The anatomy of a perfect pull request.

Checklist:

My code follows the code style of this project.
My change requires a change to the documentation.
I have updated the documentation accordingly.

Have you checked to ensure there aren't other open Pull Requests for the same update/change?
Have you added an explanation of what your changes do and why you'd like us to include them?
Have you written new tests for your core changes, as applicable?
Have you successfully ran tests with your changes locally?

Description

This PR fixes the functionality to fit a single configuration using the API. In its current implementation, api.fit() does not honour the constraints passed through the pipeline_options and as parameters to the function correctly. To enable this, I have created a get_dataset which encapsulates the code needed to create a new Dataset object. In addition, I have renamed fit to fit_pipeline to avoid confusion with the use of fit which is generally used to fit automl (search in our case) in other automl libraries.

Also, build_pipeline() did not use the include and exclude components.

Motivation and Context

This is needed in case someone wants to fit a single configuration using the constraints correctly..

Fixes #149.

How has this been tested?

I have made a test that checks if the default configuration of a pipeline can be fitted properly. It also checks other objects returned by the function fit_pipeline. I also check if the accuracy is greater than 70%. To ensure that disable_file_output is working as expected, the test also verifies if the pipeline is stored in the correct folder and it can be retrieved from there when disable_file_output is None. When it is ['all'], the test ensures that no files are stored.

codecov · 2021-11-30T15:29:22Z

Codecov Report

Merging #348 (24aac05) into development (40a3987) will increase coverage by 0.23%.
The diff coverage is 90.75%.

@@               Coverage Diff               @@
##           development     #348      +/-   ##
===============================================
+ Coverage        82.59%   82.82%   +0.23%     
===============================================
  Files              154      154              
  Lines             9061     9136      +75     
  Branches          1594     1602       +8     
===============================================
+ Hits              7484     7567      +83     
+ Misses            1108     1101       -7     
+ Partials           469      468       -1

Impacted Files	Coverage Δ
autoPyTorch/datasets/tabular_dataset.py	`85.29% <ø> (ø)`
autoPyTorch/api/base_task.py	`83.42% <80.70%> (-0.18%)`	⬇️
autoPyTorch/api/tabular_classification.py	`90.47% <100.00%> (+2.97%)`	⬆️
autoPyTorch/api/tabular_regression.py	`100.00% <100.00%> (+3.12%)`	⬆️
autoPyTorch/evaluation/abstract_evaluator.py	`77.03% <100.00%> (+0.88%)`	⬆️
autoPyTorch/evaluation/tae.py	`67.42% <100.00%> (+0.45%)`	⬆️
autoPyTorch/evaluation/train_evaluator.py	`87.50% <100.00%> (+0.09%)`	⬆️
autoPyTorch/evaluation/utils.py	`73.61% <100.00%> (+6.36%)`	⬆️
autoPyTorch/utils/common.py	`89.61% <100.00%> (+1.73%)`	⬆️
...components/setup/network_initializer/SparseInit.py	`81.81% <0.00%> (-18.19%)`	⬇️
... and 14 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 40a3987...24aac05. Read the comment docs.

nabenabe0928

Hi, thanks for the PR.
I haven't really checked base_task.py, tabular_xxx.py, so I will check them later.

examples/40_advanced/example_single_configuration.py

autoPyTorch/evaluation/train_evaluator.py

autoPyTorch/evaluation/abstract_evaluator.py

autoPyTorch/api/base_task.py

examples/40_advanced/example_single_configuration.py

autoPyTorch/api/base_task.py

autoPyTorch/api/tabular_classification.py

autoPyTorch/api/base_task.py

…le_output

eddiebergman

It seems there's no mutual exclusivity between include and exclude, can a user provide both?

I've also started adding some documentation to tests after having some frustration with some older tests in autosklearn.

Have a look at automl_common/tests/test_backend/test_contexts.py. It makes it easier to see what exactly is being tested by a function. While the function name helps, I keep having issues with it not really being descriptive enough.

autoPyTorch/api/base_task.py

autoPyTorch/evaluation/utils.py

examples/40_advanced/example_single_configuration.py

test/test_api/test_base_api.py

autoPyTorch/evaluation/utils.py

test/test_evaluation/test_utils.py

ravinkohli · 2021-12-07T16:24:04Z

It seems there's no mutual exclusivity between include and exclude, can a user provide both?

There is mutual exclusivity, check this

I've also started adding some documentation to tests after having some frustration with some older tests in autosklearn.

Have a look at automl_common/tests/test_backend/test_contexts.py. It makes it easier to see what exactly is being tested by a function. While the function name helps, I keep having issues with it not really being descriptive enough.

Sure, we'll start adding descriptions to the tests.

nabenabe0928

Sorry for the late work, but I will make steady progress whenever I have time

autoPyTorch/evaluation/utils.py

autoPyTorch/evaluation/train_evaluator.py

autoPyTorch/evaluation/abstract_evaluator.py

nabenabe0928

Hi, really sorry for the late response.
I will check everything as soon as possible

Files to check:

test_api.py

autoPyTorch/utils/common.py

test/test_utils/test_common.py

nabenabe0928 · 2021-12-17T19:31:53Z

autoPyTorch/utils/common.py

+    """
+    def __eq__(self, other: Any) -> bool:
+        if isinstance(other, autoPyTorchEnum):
+            return type(self) == type(other) and self.value == other.value


Somehow this line is not passed through when you pass a list that contains the identical enum member.
For example, when you replace L87 with raise NotImplementedError, APTEnum.x in [APTEnum.x] should raise an error, but it does not.

Note that it raises an error when you do APTEnum.x in [APTEnum.y] or APTEnum.x == APTEnum.x.
Probably, __contains__ method in enum or str or list is causing the issue, but I could not find out easily.

nabenabe0928

minor comments

autoPyTorch/api/base_task.py

autoPyTorch/evaluation/abstract_evaluator.py

autoPyTorch/api/base_task.py

nabenabe0928 · 2021-12-19T13:06:08Z

I will check test_api.py after the comments are resolved.

Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>

nabenabe0928

Thanks for the commits.
Considering your comment, I added an additional suggestion.
It is better to have it for stable behavior from user side.

nabenabe0928 · 2021-12-19T16:50:25Z

autoPyTorch/api/base_task.py

+    def fit_pipeline(
+        self,
+        configuration: Configuration,


Since we can take both xxx_train and dataset, it is better to force users to pass variables by keywords.
To achieve it, we need the * before the beginning of the keywords.

Suggested change

def fit_pipeline(

self,

configuration: Configuration,

def fit_pipeline(

self,

configuration: Configuration,

*,

Okay, I have implemented **dataset_kwargs

No no, you can just add * as in the suggestion and that works nicer than kwargs which hides information

You can check the behavior using toy functions

oh okay, I got it. Thanks, I actually did not know about this functionality.

…348)

* Add fit pipeline with tests * Add documentation for get dataset * update documentation * fix tests * remove permutation importance from visualisation example * change disable_file_output * add * fix flake * fix test and examples * change type of disable_file_output * Address comments from eddie * fix docstring in api * fix tests for base api * fix tests for base api * fix tests after rebase * reduce dataset size in example * remove optional from doc string * Handle unsuccessful fitting of pipeline better * fix flake in tests * change to default configuration for documentation * add warning for no ensemble created when y_optimization in disable_file_output * reduce budget for single configuration * address comments from eddie * address comments from shuhei * Add autoPyTorchEnum * fix flake in tests * address comments from shuhei * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * fix flake * use **dataset_kwargs * fix flake * change to enforce keyword args Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>

* [feat] Support statistics print by adding results manager object (#334) * [feat] Support statistics print by adding results manager object * [refactor] Make SearchResults extract run_history at __init__ Since the search results should not be kept in eternally, I made this class to take run_history in __init__ so that we can implicitly call extraction inside. From this change, the call of extraction from outside is not recommended. However, you can still call it from outside and to prevent mixup of the environment, self.clear() will be called. * [fix] Separate those changes into PR#336 * [fix] Fix so that test_loss includes all the metrics * [enhance] Strengthen the test for sprint and SearchResults * [fix] Fix an issue in documentation * [enhance] Increase the coverage * [refactor] Separate the test for results_manager to organize the structure * [test] Add the test for get_incumbent_Result * [test] Remove the previous test_get_incumbent and see the coverage * [fix] [test] Fix reversion of metric and strengthen the test cases * [fix] Fix flake8 issues and increase coverage * [fix] Address Ravin's comments * [enhance] Increase the coverage * [fix] Fix a flake8 issu * [doc] Add the workflow of the Auto-Pytorch (#285) * [doc] Add workflow of the AutoPytorch * [doc] Address Ravin's comment * Update README.md with link for master branch * [feat] Add an object that realizes the perf over time viz (#331) * [feat] Add an object that realizes the perf over time viz * [fix] Modify TODOs and add comments to avoid complications * [refactor] [feat] Format visualizer API and integrate this feature into BaseTask * [refactor] Separate a shared raise error process as a function * [refactor] Gather params in Dataclass to look smarter * [refactor] Merge extraction from history to the result manager Since this feature was added in a previous PR, we now rely on this feature to extract the history. To handle the order by the start time issue, I added the sort by endtime feature. * [feat] Merge the viz in the latest version * [fix] Fix nan --> worst val so that we can always handle by number * [fix] Fix mypy issues * [test] Add test for get_start_time * [test] Add test for order by end time * [test] Add tests for ensemble results * [test] Add tests for merging ensemble results and run history * [test] Add the tests in the case of ensemble_results is None * [fix] Alternate datetime to timestamp in tests to pass universally Since the mapping of timestamp to datetime variates on machine, the tests failed in the previous version. In this version, we changed the datetime in the tests to the fixed timestamp so that the tests will pass universally. * [fix] Fix status_msg --> status_type because it does not need to be str * [fix] Change the name for the homogeniety * [fix] Fix based on the file name change * [test] Add tests for set_plot_args * [test] Add tests for plot_perf_over_time in BaseTask * [refactor] Replace redundant lines by pytest parametrization * [test] Add tests for _get_perf_and_time * [fix] Remove viz attribute based on Ravin's comment * [fix] Fix doc-string based on Ravin's comments * [refactor] Hide color label settings extraction in dataclass Since this process makes the method in BaseTask redundant and this was pointed out by Ravin, I made this process a method of dataclass so that we can easily fetch this information. Note that since the color and label information always depend on the optimization results, we always need to pass metric results to ensure we only get related keys. * [test] Add tests for color label dicts extraction * [test] Add tests for checking if plt.show is called or not * [refactor] Address Ravin's comments and add TODO for the refactoring * [refactor] Change KeyError in EnsembleResults to empty Since it is not convenient to not be able to instantiate EnsembleResults in the case when we do not have any histories, I changed the functionality so that we can still instantiate even when the results are empty. In this case, we have empty arrays and it also matches the developers intuition. * [refactor] Prohibit external updates to make objects more robust * [fix] Remove a member variable _opt_scores since it is confusing Since opt_scores are taken from cost in run_history and metric_dict takes from additional_info, it was confusing for me where I should refer to what. By removing this, we can always refer to additional_info when fetching information and metrics are always available as a raw value. Although I changed a lot, the functionality did not change and it is easier to add any other functionalities now. * [example] Add an example how to plot performance over time * [fix] Fix unexpected train loss when using cross validation * [fix] Remove __main__ from example based on the Ravin's comment * [fix] Move results_xxx to utils from API * [enhance] Change example for the plot over time to save fig Since the plt.show() does not work on some environments, I changed the example so that everyone can run at least this example. * Cleanup of simple_imputer (#346) * cleanup of simple_imputer * Fixed doc and typo * Fixed docs * Made changes, added test * Fixed init statement * Fixed docs * Flake'd * [feat] Add the option to save a figure in plot setting params (#351) * [feat] Add the option to save a figure in plot setting params Since non-GUI based environments would like to avoid the usage of show method in the matplotlib, I added the option to savefig and thus users can complete the operations inside AutoPytorch. * [doc] Add a comment for non-GUI based computer in plot_perf_over_time method * [test] Add a test to check the priority of show and savefig Since plt.savefig and plt.show do not work at the same time due to the matplotlib design, we need to check whether show will not be called when a figname is specified. We can actually raise an error, but plot will be basically called in the end of an optimization, so I wanted to avoid raising an error and just sticked to a check by tests. * Update workflow files (#363) * update workflow files * Remove double quotes * Exclude python 3.10 * Fix mypy compliance check * Added PEP 561 compliance * Add py.typed to MANIFEST for dist * Update .github/workflows/dist.yml Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * [ADD] fit pipeline honoring API constraints with tests (#348) * Add fit pipeline with tests * Add documentation for get dataset * update documentation * fix tests * remove permutation importance from visualisation example * change disable_file_output * add * fix flake * fix test and examples * change type of disable_file_output * Address comments from eddie * fix docstring in api * fix tests for base api * fix tests for base api * fix tests after rebase * reduce dataset size in example * remove optional from doc string * Handle unsuccessful fitting of pipeline better * fix flake in tests * change to default configuration for documentation * add warning for no ensemble created when y_optimization in disable_file_output * reduce budget for single configuration * address comments from eddie * address comments from shuhei * Add autoPyTorchEnum * fix flake in tests * address comments from shuhei * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * fix flake * use **dataset_kwargs * fix flake * change to enforce keyword args Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * [ADD] Docker publish workflow (#357) * Add workflow for publishing docker image to github packages and dockerhub * add docker installation to docs * add workflow dispatch * fix error after merge * Fix 361 (#367) * check if N==0, and handle this case * change position of comment * Address comments from shuhei * [ADD] Test evaluator (#368) * add test evaluator * add no resampling and other changes for test evaluator * finalise changes for test_evaluator, TODO: tests * add tests for new functionality * fix flake and mypy * add documentation for the evaluator * add NoResampling to fit_pipeline * raise error when trying to construct ensemble with noresampling * fix tests * reduce fit_pipeline accuracy check * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * address comments from shuhei * fix bug in base data loader * fix bug in data loader for val set * fix bugs introduced in suggestions * fix flake * fix bug in test preprocessing * fix bug in test data loader * merge tests for evaluators and change listcomp in get_best_epoch * rename resampling strategies * add test for get dataset Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * [fix] Hotfix debug no training in simple intensifier (#370) * [fix] Fix the no-training-issue when using simple intensifier * [test] Add a test for the modification * [fix] Modify the default budget so that the budget is compatible Since the previous version does not consider the provided budget_type when determining the default budget, I modified this part so that the default budget does not mix up the default budget for epochs and runtime. Note that since the default pipeline config defines epochs as the default budget, I also followed this rule when taking the default value. * [fix] Fix a mypy error * [fix] Change the total runtime for single config in the example Since the training sometimes does not finish in time, I increased the total runtime for the training so that we can accomodate the training in the given amount of time. * [fix] [refactor] Fix the SMAC requirement and refactor some conditions * [fix] Change int to np.int32 for the ndarray dtype specification (#371) * [ADD] variance thresholding (#373) * add variance thresholding * fix flake and mypy * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * [ADD] scalers from autosklearn (#372) * Add new scalers * fix flake and mypy * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * add robust scaler * fix documentation * remove power transformer from feature preprocessing * fix tests * check for default in include and exclude * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * [FIX] Remove redundant categorical imputation (#375) * remove categorical strategy from simple imputer * fix tests * address comments from eddie * fix flake and mypy error * fix test cases for imputation * [feat] Add coalescer (#376) * [fix] Add check dataset in transform as well for test dataset, which does not require fit * [test] Migrate tests from the francisco's PR without modifications * [fix] Modify so that tests pass * [test] Increase the coverage * Fix: keyword arguments to submit (#384) * Fix: keyword arguments to submit * Fix: Missing param for implementing AbstractTA * Fix: Typing of multi_objectives * Add: mutli_objectives to each ExecuteTaFucnWithQueue * [FIX] Datamanager in memory (#382) * remove datamanager instances from evaluation and smbo * fix flake * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * fix flake Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * [feat] Add new task inference for APT (#386) * [fix] Fix the task inference issue mentioned in #352 Since sklearn task inference regards targets with integers as a classification task, I modified target_validator so that we always cast targets for regression to float. This workaround is mentioned in the reference below: scikit-learn/scikit-learn#8952 * [fix] [test] Add a small number to label for regression and add tests Since target labels are required to be float and sklearn requires numbers after a decimal point, I added a workaround to add the almost possible minimum fraction to array so that we can avoid a mis-inference of task type from sklearn. Plus, I added tests to check if we get the expected results for extreme cases. * [fix] [test] Adapt the modification of targets to scipy.sparse.xxx_matrix * [fix] Address Ravin's comments and loosen the small number choice * [fix] Update the SMAC version (#388) * [ADD] dataset compression (#387) * Initial implementation without tests * add tests and make necessary changes * improve documentation * fix tests * Apply suggestions from code review Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * undo change in as it causes tests to fail * change name from InputValidator to input_validator * extract statements to methods * refactor code * check if mapping is the same as expected * update precision reduction for dataframes and tests * fix flake Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> * [refactor] Fix SparseMatrixType --> spmatrix and add ispandas (#397) * [ADD] feature preprocessors from autosklearn (#378) * in progress * add remaining preprocessors * fix flake and mypy after rebase * Fix tests and add documentation * fix tests bug * fix bug in tests * fix bug where search space updates were not honoured * handle check for score func in feature preprocessors * address comments from shuhei * apply suggestions from code review * add documentation for feature preprocessors with percent to int value range * fix tests * fix tests * address comments from shuhei * fix tests which fail due to scaler * [feat] Add __str__ to autoPyTorchEnum (#405) * [ADD] Subsampling Dataset (#398) * initial implementation * fix issue with missing classes * finalise implementation, add documentation * fix tests * add tests from ask * fix issues from feature preprocessing PR * address comments from shuhei * address comments from code review * address comments from shuhei * fix dist twine check for github (#439) * Time series forecasting (#434) * new target scaler, allow NoNorm for MLP Encpder * allow sampling full sequences * integrate SeqBuilder to SequenceCollector * restore SequenceBuilder to reduce memory usage * move scaler to network * lag sequence * merge encoder and decoder as a single pipeline * faster lag_seq builder * maint * new init, faster DeepAR inference in trainer * more losses types * maint * new Transformer models, allow RNN to do deepAR inference * maint * maint * maint * maint * reduced search space for Transformer * reduced init design * maint * maint * maint * maint * faster forecasting * maint * allow singel fidelity * maint * fix budget num_seq * faster sampler and lagger * maint * maint * maint deepAR * maint * maint * cross validation * allow holdout for smaller datasets * smac4ac to smac4hpo * maint * maint * allow to change decoder search space * more resampling strategy, more options for MLP * reduced NBEATS * subsampler for val loader * rng for dataloader sampler * maint * remove generator as it cannot be pickled * allow lower fidelity to evaluate less test instances * fix dummy forecastro isues * maint * add gluonts as requirement * more data for val set for larger dataset * maint * maint * fix nbeats decoder * new dataset interface * resolve conflict * maint * allow encoder to receive input from different sources * multi blocks hp design * maint * correct hp updates * first trial on nested conjunction * maint * fit for deep AR model (needs to be reverted when the issue in ConfigSpace is fixed!!!) * adjust backbones to fit new structure * further API changes * tft temporal fusion decoder * construct network * cells for networks * forecasting backbones * maint * maint * move tft layer to backbone * maint * quantile loss * maint * maint * maint * maint * maint * maint * forecasting init configs * add forbidden * maint * maint * maint * remove shift data * maint * maint * copy dataset_properties for each refit iteration * maint and new init * Tft forecating with features (#6) * time feature transform * tft with time-variing features * transform features allowed for all architecture * repair mask for temporal fusion layer * maint * fix loss computation in QuantileLoss * fixed scaler computation * maint * fix dataset * adjust window_size to seasonality * maint scaling * fix uncorrect Seq2Seq scaling * fix sampling for seq2seq * maint * fix scaling in NBEATS * move time feature computation to dataset * maint * fix feature computation * maint * multi-variant feature validator * maint * validator for multi-variant series * feature validator * multi-variant datasets * observed targets * stucture adjustment * refactory ts tasks and preprocessing * allow nan in targets * preprocessing for time series * maint * forecasting pipeline * maint * embedding and maint * move targets to the tail of the features * maint * static features * adjsut scaler to static features * remove static features from forward dict * test transform * maint * test sets * adjust dataset to allow future known features * maint * maint * flake8 * synchronise with development * recover timeseries * maint * maint * limit memory usage tae * revert test api * test for targets * not allow sparse forecasting target * test for data validator * test for validations * test on TimeSeriesSequence * maint * test for resampling * test for dataset 1 * test for datasets * test on tae * maint * all evaluator to evalaute test sets * tests on losses * test for metrics * forecasting preprocessing * maint * finish test for preprocessing * test for data loader * tests for dataloader * maint * test for target scaling 1 * test for target scaer * test for training loss * maint * test for network backbone * test for backbone base * test for flat encoder * test for seq encoder * test for seqencoder * maint * test for recurrent decoders * test for network * maint * test for architecture * test for pipelines * fixed sampler * maint sampler * resolve conflict between embedding and net encoder * fix scaling * allow transform for test dataloader * maint dataloader * fix updates * fix dataset * tests on api, initial design on multi-variant * maint * fix dataloader * move test with for loop to unittest.subtest * flake 8 and update requirement * mypy * validator for pd dataframe * allow series idx for api * maint * examples for forecasting * fix mypy * properly memory limitation for forecasting example * fix pre-commit * maint dataloader * remove unused auto-regressive arguments * fix pre-commit * maint * maint mypy * mypy!!! * pre-commit * mypyyyyyyyyyyyyyyyyyyyyyyyy * maint * move forcasting requirements to extras_require * bring eval_test to tae * make rh2epm consistent with SMAC4HPO * remove smac4ac from smbo * revert changes in network * revert changes in trainer * revert format changes * move constant_forecasting to constatn * additional annotate for base pipeline * move forecasting check to tae * maint time series refit dataset * fix test * workflow for extra requirements * docs for time series dataset * fix pre-commit * docs for dataset * maint docstring * merge target scaler to one file * fix forecasting init cfgs * remove redudant pipeline configs * maint * SMAC4HPO instead of SMAC4AC in smbo (will be reverted further if study shows that SMAC4HPO is superior to SMAC4AC) * fixed docstrign for RNN and Transformer Decoder * uniformed docstrings for smbo and base task * correct encoder to decoder in decoder.init * fix doc strings * add license and docstrings for NBEATS heads * allow memory limit to be None * relax test load for forecasting * fix docs * fix pre-commit * make test compatible with py37 * maint docstring * split forecasting_eval_train_function from eval_train_function * fix namespace for test_api from train_evaluator to tae * maint test api for forecasting * decrease number of ensemble size of test_time_series_forecasting to reduce test time * flatten all the prediction for forecasting pipelines * pre-commit fix * fix docstrings and typing * maint time series dataset docstrings * maint warning message in time_series_forecasting_train_evaluator * fix lines that are overlength Co-authored-by: NHML23117 <nhmldeng@login03.css.lan> Co-authored-by: Deng Difan <deng@p200300cd070f1f50dabbc1fffe9c6aa9.dip0.t-ipconnect.de> * fit updates in gluonts (#445) * fit updates in gluonts * fit gluonts version * docs for forecasting task (#443) * docs for forecasting task * avoid directly import extra dependencies * Update docs/dev.rst Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * make ForecastingDependenciesNotInstalledError a str message * make ForecastingDependenciesNotInstalledError a str message * update readme and examples * add explanation for univariant models in example Co-authored-by: Ravin Kohli <13005107+ravinkohli@users.noreply.github.com> * [ADD] Allow users to pass feat types to tabular validator (#441) * add tests and make get_columns_to_encode in tabular validator * fix flake and mypy and silly bug * pass feat types to search function of the api * add example * add openml to requirements * add task ids to populate cache * add check for feat types * fix mypy and flake * [RELEASE] Changes for release v0.2 (#446) * change to version 0.2 * add flaky for failing test * [FIX] Documentation and docker workflow file (#449) * fixes to documentation and docker * fix to docker * Apply suggestions from code review * add change log for release (#450) Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com> Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> Co-authored-by: dengdifan <33290713+dengdifan@users.noreply.github.com> Co-authored-by: NHML23117 <nhmldeng@login03.css.lan> Co-authored-by: Deng Difan <deng@p200300cd070f1f50dabbc1fffe9c6aa9.dip0.t-ipconnect.de>

ravinkohli added the bug Something isn't working label Nov 30, 2021

ravinkohli linked an issue Nov 30, 2021 that may be closed by this pull request

Improve fitting a pipeline to the dataset #149

Closed

ravinkohli requested a review from nabenabe0928 December 1, 2021 10:13

nabenabe0928 reviewed Dec 2, 2021

View reviewed changes

ravinkohli mentioned this pull request Dec 2, 2021

[REFACTOR] Too many variables in search and fit_pipeline. #353

Closed

eddiebergman reviewed Dec 6, 2021

View reviewed changes

nabenabe0928 added the first priority PRs to be checked as a priority label Dec 6, 2021

ravinkohli commented Dec 6, 2021

View reviewed changes

autoPyTorch/api/base_task.py Outdated Show resolved Hide resolved

ravinkohli added 15 commits December 6, 2021 21:11

Add fit pipeline with tests

cdcf766

Add documentation for get dataset

bc5b469

update documentation

0359c8c

fix tests

75eb604

remove permutation importance from visualisation example

136f619

change disable_file_output

4731363

add

af48ebf

fix flake

3df4e06

fix test and examples

e8289e4

change type of disable_file_output

4018d02

Address comments from eddie

add8890

fix docstring in api

d8739cd

fix tests for base api

f1ea974

fix tests for base api

38471f1

fix tests after rebase

02ac9de

ravinkohli force-pushed the add_fit_pipeline branch from b47d400 to 02ac9de Compare December 6, 2021 20:21

ravinkohli added 5 commits December 7, 2021 10:57

reduce dataset size in example

fd32939

remove optional from doc string

3958750

Handle unsuccessful fitting of pipeline better

c33381a

fix flake in tests

dff0e5c

change to default configuration for documentation

eb648e5

add warning for no ensemble created when y_optimization in disable_fi…

974ea1c

…le_output

ravinkohli requested review from nabenabe0928 and eddiebergman December 7, 2021 13:36

reduce budget for single configuration

cc19e4c

eddiebergman reviewed Dec 7, 2021

View reviewed changes

address comments from eddie

ab93ee6

nabenabe0928 reviewed Dec 8, 2021

View reviewed changes

nabenabe0928 reviewed Dec 9, 2021

View reviewed changes

ravinkohli added 3 commits December 9, 2021 10:42

address comments from shuhei

c246b20

Add autoPyTorchEnum

a0a4e75

fix flake in tests

a0fef77

nabenabe0928 reviewed Dec 17, 2021

View reviewed changes

autoPyTorch/api/base_task.py Outdated Show resolved Hide resolved

autoPyTorch/api/base_task.py Show resolved Hide resolved

nabenabe0928 requested changes Dec 18, 2021

View reviewed changes

autoPyTorch/api/base_task.py Outdated Show resolved Hide resolved

autoPyTorch/api/base_task.py Outdated Show resolved Hide resolved

nabenabe0928 requested changes Dec 18, 2021

View reviewed changes

ravinkohli and others added 3 commits December 19, 2021 15:46

address comments from shuhei

8094ff1

Apply suggestions from code review

4d90706

Co-authored-by: nabenabe0928 <47781922+nabenabe0928@users.noreply.github.com>

fix flake

c7cc712

nabenabe0928 requested changes Dec 19, 2021

View reviewed changes

ravinkohli added 3 commits December 20, 2021 11:50

use **dataset_kwargs

14113f9

fix flake

5b2f75f

change to enforce keyword args

24aac05

ravinkohli merged commit 57a490a into automl:development Dec 20, 2021

github-actions bot pushed a commit that referenced this pull request Dec 20, 2021

Ravin Kohli: [ADD] fit pipeline honoring API constraints with tests (#…

1036020

…348)

[ADD] fit pipeline honoring API constraints with tests #348

[ADD] fit pipeline honoring API constraints with tests #348

Conversation

ravinkohli commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Types of changes

Checklist:

Description

Motivation and Context

How has this been tested?

Uh oh!

codecov bot commented Nov 30, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

nabenabe0928 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

eddiebergman left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravinkohli commented Dec 7, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

nabenabe0928 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nabenabe0928 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

nabenabe0928 Dec 17, 2021

Choose a reason for hiding this comment

Uh oh!

nabenabe0928 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ravinkohli commented Nov 30, 2021 •

edited

Loading

codecov bot commented Nov 30, 2021 •

edited

Loading

ravinkohli commented Dec 7, 2021 •

edited

Loading

nabenabe0928 left a comment •

edited

Loading

nabenabe0928 Dec 20, 2021 •

edited

Loading