Type estimators #1542

eddiebergman · 2022-07-16T20:58:38Z

This is a pretty big PR aimed at doing a simple thing, remove estimators.py and automl.py from the mypy ignore list. In Progress, notes on changes are TODO. I'll resolve conflicts once v0.15 is out. I can also split this into multiple smaller PRS to make it easier if needed.

Tests still need to be updated to accomodate changes.

There were 168 typing errors :) Some of them were actual possible bugs based on order of things being called and parameters set.

Major points:

Made AutoML an ABC. This removes some failure cases such as when task type is not defined or is_classification is misspecified, see Tests for the AutoML class relying on is_classification=false even when it is a classificaiton task, crash when corrected #1212. AutoML now relies on its subclasses specfying things which was also simplified greatly with just class variables.
Also removed the methods which just forward all the args with is_classification=True/False since those parameters were removed as well.

class AutoMLRegressor(AutoML, RegressorMixin):
    _task_mappings = {...}
    is_classification = False
    
class AutoMLClassifier(AutoML, ClassifierMixin):
    def predict(...): ...
    def predict_proba(...): ...

Made the AutoSklearnEstimator smarter with respect to types in a similar fashion, notably it's smarter around what it retunrns through the use of a Generic in the main class and providing those types in the subclass. This mainly means that code editors will know if predict_proba will be available or not and that fit will return the the right estimator and not just the abtract AutoSklearnEstimator.

Self = TypeVar("Self", bound="AutoSklearnEstimator")

TParetoModel = TypeVar("TParetoModel", VotingClassifier, VotingRegressor)
TAutoML = TypeVar("TAutoML", bound=AutoML)

class AutoSklearnEstimator(ABC, BaseEstimator, Generic[TAutoML, TParetoModel]): ...

    # Knows it returns the same type as self, AsklearnClassifier or AsklearnRegressor
    def fit(self: Self, ...) -> Self: ...
    
    # Knows if its a AutoMLClassifier, AutoMLRegressor
    def automl() -> TAutoML: ...
    
    # Knows that the pareto models are a VotingClassifier/Regressor 
    def get_pareto_set() -> Sequence[TParetoModel]: ...

These are then specified in the subclass as

class AutoSklearnClassifier(AutosklearnEstimator[AutoMLClassifier, VotingClassifier]): ... 
class AutoSklearnRegressor(AutosklearnEstimator[AutoMLRegressor, VotingRegressor]): ...

Many things that are set in fit are now wrapped in a property, ie. self._logger or self._task and raising a NotFittedError as sklearn would. This is because using them in other methods would correctly warn something like "self._task could be "None" if trying to call methods relying on fit to have been called first.

class AutoMl:

    @property
    def task(self) -> int:
        if self._task is None:
            raise NotFittedError("`task` has not been set, please call `fit` first")

        return self._task
        
    @property
    def input_validator(self) -> InputValidator:
        if self._input_validator is None:
            raise NotFittedError(
                "`input_validator` has not been set, please call `fit` first"
            )

        return self._input_validator

Make transform, get_cost_of_crash smarter with @overload, it now knows it's return type correctly based on type of input.

@overload   # Not the type for y is None, indicating None returned
def transform(self, X: XType, y: None = None,) -> tuple[XType, None]: ...

@overload  # And here, it's different
def transform(self, X: XType, y: YType) -> tuple[XType, YType]: ...

def transform(self, X: XType, y: YType | None = None) -> tuple[XType, YType | None]:
    ...

Run pyupgrade on a few files I touched
Simplified the datacompression things into a class, the typing caught some weirdness when datacompression was on but memory_limit wasn't set.

* only active if kernel == 'poly' * adapt the metadata to reflect this

* black checker * Simplified * add examples to black format check Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>

* re-structure manual and use 'collapse' * ADD link to auto-sklearn-talks * unifying titles * Clarify default memory and cpu usage * FIX sphinx_gallery to <=0.10.0 0.10.1 would raise an error for '-D plot_gallery=0' * Re-structure faq * FIX comments by mfeurer * boldface items * merge manual into FAQ * FIX minor * FIX typo * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/manual.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/manual.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * Update doc/faq.rst Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com> * FIX link Co-authored-by: Eddie Bergman <eddiebergmanhs@gmail.com>

If you're only exposure to using... -> If your only exposure to using...

* np.bool deprecation * Invalid escape sequence \_ * Series specify dtype * drop na requires keyword args deprecation * unspecified np.int size deprecated, use int instead * deprecated unspeicifed np.int precision * Element wise comparison failed, will raise error in the future * Specify explicit dtype for empty series * metric warnings for mismatch between y_pred and y_true label count * Quantile transformer n_quantiles larger than n_samples warning ignored * Silenced convergence warnings * pass sklearn args as keywords * np.bool deprecation * Invalid escape sequence \_ * Series specify dtype * drop na requires keyword args deprecation * unspecified np.int size deprecated, use int instead * deprecated unspeicifed np.int precision * Element wise comparison failed, will raise error in the future * Specify explicit dtype for empty series * metric warnings for mismatch between y_pred and y_true label count * Quantile transformer n_quantiles larger than n_samples warning ignored * Silenced convergence warnings * pass sklearn args as keywords * flake8'd * flake8'd * Fixed CategoricalImputation not accounting for sparse matrices * Updated to use distro for linux distribution * Ignore convergence warnings for gaussian process regressor * Averaging metrics now use zero_division parameter * Readded scorers to module scope * flake8'd * Fix * Fixed dtype for metalearner no run * Catch gaussian process iterative fit warning * Moved ignored warnings to tests * Correctly type pd.Series * Revert back to usual iterative fit * Readded missing iteration increment * Removed odd backslash * Fixed imputer for sparse matrices * Ignore warnings we are aware about in tests * Flake'd: * Revert "Fixed imputer for sparse matrices" This reverts commit 05675ad. * Revert "Revert "Fixed imputer for sparse matrices"" This reverts commit d031b0d. * Back to default values * Reverted to default behaviour with comment * Added xfail test to document * flaked * Fixed test, moved to np.testing for assertion * Update autosklearn/pipeline/components/data_preprocessing/categorical_encoding/encoding.py Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>

* Added manual dispatch to tests * Removed parameters to manual dispatch

…tors (#1332) * Update docstrings and types * doc typo fix * flake'd

* added python 3.10 to versions * Added quotes around versions * Trigger tests

* Add submodule * Port to abstract_ensemble, backend from automl_common * Updated workflow files * Update imports * Trigger actions * Another import fix * update import * m * Backend fixes * Backend parameter update * fixture fix for backend * Fix tests * readd old abstract ensemble for now * flake8'd * Added install from source to readme * Moved installation w.r.t submodules to the docs * Temporarily remove submodule * Readded submodule * Updated to use automl_common under autosklearn * Updated MANIFEST * Removed uneeded statements from MANIFEST * Fixed import * Fixed comment line in MANIFEST.in * Added automl_common/setup.py to MANIFEST * Added prefix to script * Re-added removed title # * Added note for submodule for CONTRIBUTING * Made the submodule step a bit more clear for contributing.md * CONTRIBUTING fixes

* Added versioning for sphinx, docutils - introduced by sphinxtoolbox * Fixed bug with config value for `plot_gallery` in doc makefile * Update linkcheck command as well

* Added ignored_warnings file * Use ignored_warnings file * Test regressors with 1d, 1d as 2d and 2d targets * Flake'd * Fix broken relative imports to ignore_warnings * Removed print and updated parameter type for tests * Type import fix

* Added random state to classifiers * Added some doc strings * Removed random_state again * flake'd * Fix some test issues * Re-added seed to test * Updated test doc for unknown test * flake'd

* Added ignored_warnings file * Use ignored_warnings file * Test regressors with 1d, 1d as 2d and 2d targets * Flake'd * Fix broken relative imports to ignore_warnings * Removed print and updated parameter type for tests * Added warning catches to fit methods in tests * Added more warning catches * Flake'd * Created top-level module to allow relativei imports * Deleted blank line in __init__ * Remove uneeded ignore warnings from tests * Fix bad indent * Fix github merge conflict editor whitespaces and indents

* update workflow files * typo fix * Update pytest * remove bad semi-colon * Fix test runner command * Remove explicit steps required from older version * Explicitly add Conda python to path for subprocess command in test * Fix the mypy compliance check * Added PEP 561 compliance * Add py.typed to MANIFEST for dist * Remove py.typed from setup.py

* rename OSX -> macOS as it is the new name rename OSX -> macOS as it is the new name for the operating system. e.g. see https://www.apple.com/macos * Update doc/installation.rst Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de> * Update doc/installation.rst Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de> Co-authored-by: Matthias Feurer <lists@matthiasfeurer.de>

…semble (#1321) * Changed show_models() function to return a dictionary of models in the ensemble instead of a string

* Remove flaky dep * Remove unused pytest import

* Fix: MLPRegressor tests * Fix: Ordering of statements in test * Fix: MLP n_calls

* Fix: Raises errors with the config * Add: Skip error for kernal_pca Seems kernel_pca emits the error: * `"zero-size array to reduction operation maximum which has no identity"` This is gotten on the line `max_eig = lambdas.max()` which makes me assume it emits a matrix with no real eigen values, not something we can really control for

…#1316)

…ures (#1250) * Moved to new splitter, moved to util file * flake8'd * Fixed errors, added test specifically for CustomStratifiedShuffleSplit * flake8'd * Updated docstring * Updated types in docstring * reduce_dataset_size_if_too_large supports more types * flake8'd * flake8'd * Updated docstring * Seperated out the data subsampling into individual functions * Improved typing from Automl.fit to reduce_dataset_size_if_too_large * flak8'd * subsample tested * Finished testing and flake8'd * Cleaned up transform function that was touched * ^ * Removed double typing * Cleaned up typing of convert_if_sparse * Cleaned up splitters and added size test * Cleanup doc in data * rogue line added was removed * Test fix * flake8'd * Typo fix * Fixed ordering of things * Fixed typing and tests of target_validator fit, transform, inv_transform * Updated doc * Updated Type return * Removed elif gaurd * removed extraneuous overload * Updated return type of feature validator * Type fixes for target validator fit * flake8'd * Moved to new splitter, moved to util file * flake8'd * Fixed errors, added test specifically for CustomStratifiedShuffleSplit * flake8'd * Updated docstring * Updated types in docstring * reduce_dataset_size_if_too_large supports more types * flake8'd * flake8'd * Updated docstring * Seperated out the data subsampling into individual functions * Improved typing from Automl.fit to reduce_dataset_size_if_too_large * flak8'd * subsample tested * Finished testing and flake8'd * Cleaned up transform function that was touched * ^ * Removed double typing * Cleaned up typing of convert_if_sparse * Cleaned up splitters and added size test * Cleanup doc in data * rogue line added was removed * Test fix * flake8'd * Typo fix * Fixed ordering of things * Fixed typing and tests of target_validator fit, transform, inv_transform * Updated doc * Updated Type return * Removed elif gaurd * removed extraneuous overload * Updated return type of feature validator * Type fixes for target validator fit * flake8'd * Fixed err message str and automl sparse y tests * Flak8'd * Fix sort indices * list type to List * Remove uneeded comment * Updated comment to make it more clear * Comment update * Fixed warning message for reduce_dataset_if_too_large * Fix test * Added check for error message in tests * Test Updates * Fix error msg * reinclude csr y to test * Reintroduced explicit subsample values test * flaked * Missed an uncomment * Update the comment for test of splitters * Updated warning message in CustomSplitter * Update comment in test * Update tests * Removed overloads * Narrowed type of subsample * Removed overload import * Fix `todense` giving np.matrix, using `toarray` * Made subsampling a little less aggresive * Changed multiplier back to 10 * Allow argument to specfiy how auto-sklearn handles compressing dataset size (#1341) * Added dataset_compression parameter and validation * Fix docstring * Updated docstring for `resampling_strategy` * Updated param def and memory_allocation can now be absolute * insert newline * Fix params into one line * fix indentation in docs * fix import breaks * Allow absolute memory_allocation * Tests * Update test on for precision omitted from methods * Update test for akslearn2 with same args * Update to use TypedDict for better Mypy parsing * Added arg to asklearn2 * Updated tests to remove some warnings * flaked * Fix broken link? * Remove TypedDict as it's not supported in Python3.7 * Missing import * Review changes * Fix magic mock for python < 3.9 * Fixed bad merge

* commit meta learning data bases * commit changed files * commit new files * fixed experimental settings * implemented last comments on old PR * adapted metalearning to last commit * add a text preprocessing example * intigrated feedback * new changes on *.csv files * reset changes * add changes for merging * add changes for merging * add changes for merging * try to merge * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * fixed string representation for metalearning (some sort of hot fix, maybe this needs to be fixed in a bigger scale) * init * init * commit changes for text preprocessing * text prepreprocessing commit * fix metalearning * fix metalearning * adapted test to new text feature * fix style guide issues * integrate PR comments * integrate PR comments * implemented the comments to the last PR * fitted operation is not in place therefore we have to assgin the fitted self.preprocessor again to it self * add first text processing tests * add first text processing tests * including comments from 01.25. * including comments from 01.28. * including comments from 01.28. * including comments from 01.28. * including comments from 01.31.

* Init commit * Fix logging server cleanup (#1503) * Fix logging server cleanup * Add comment relating to the `try: finally:` * Remove nested try: except: from `fit` * Bump peter-evans/find-comment from 1 to 2 (#1520) Bumps [peter-evans/find-comment](https://github.com/peter-evans/find-comment) from 1 to 2. - [Release notes](https://github.com/peter-evans/find-comment/releases) - [Commits](peter-evans/find-comment@v1...v2) --- updated-dependencies: - dependency-name: peter-evans/find-comment dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Bump actions/stale from 4 to 5 (#1521) Bumps [actions/stale](https://github.com/actions/stale) from 4 to 5. - [Release notes](https://github.com/actions/stale/releases) - [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md) - [Commits](actions/stale@v4...v5) --- updated-dependencies: - dependency-name: actions/stale dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> * Init commit * Update evaluation module * Clean up other occurences of the word validation * Re-add test for test predictions Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Add debug statements and 30s timeouts * Fix formatting * Update internal timeout param * +timeout, use allocated tmpdir * +timeout, use allocated tmpdir * Remove another occurence of explicit `tmp` * Increase timelimits once again * Remove incomplete comment

* Init commit * Fix DummyClassifiers in _load_pareto_set * Add test for dummy only in classifiers * Update no ensemble docstring * Add automl case where automl only has dummy * Remove tmp file * Fix `include` statement to be regressor

* Create PR * Update MLP regressor values

* Make docker file install from `setup.py` * Add pytest cache to gitignore * Up timeouts on test_metadata_generation

* Create PR * Fix test fixture

* Bump docker/build-push-action from 1 to 3 Bumps [docker/build-push-action](https://github.com/docker/build-push-action) from 1 to 3. - [Release notes](https://github.com/docker/build-push-action/releases) - [Commits](docker/build-push-action@v1...v3) --- updated-dependencies: - dependency-name: docker/build-push-action dependency-type: direct:production update-type: version-update:semver-major ... Signed-off-by: dependabot[bot] <support@github.com> * Update docker-publish.yml Replace password by token Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com> Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>

* Create PR * Abstract out dask client types * Fix _ issue * Extend scope of dask_client in automl.py * Add docstring to dask module * Indent result addition * Add basic tests for Dask wrappers

mfeurer and others added 30 commits November 17, 2021 14:39

Fix SVR degree hyperparameter (#1308)

6f13379

* only active if kernel == 'poly' * adapt the metadata to reflect this

Black format checker (#1311)

4703495

* black checker * Simplified * add examples to black format check Co-authored-by: Matthias Feurer <feurerm@informatik.uni-freiburg.de>

Save runhistory in every iteration (#1306)

f22c986

Fix typo in contribution guide (#1322)

3761f9b

If you're only exposure to using... -> If your only exposure to using...

Added isort checker (#1326)

1901b1c

Enable tests to be manually triggered (#1325)

0e455bc

* Added manual dispatch to tests * Removed parameters to manual dispatch

Update docstrings of include and exclude parameters of the estima…

0575b75

…tors (#1332) * Update docstrings and types * doc typo fix * flake'd

added python 3.10 to versions (#1260)

a88eeae

* added python 3.10 to versions * Added quotes around versions * Trigger tests

Fixed dependancies warnings introduced by sphinx_toolbox (#1339)

a2f7ca2

* Added versioning for sphinx, docutils - introduced by sphinxtoolbox * Fixed bug with config value for `plot_gallery` in doc makefile * Update linkcheck command as well

Update example to use predefined_split properly (#1340)

772f268

Update isort-check.yaml to remove occurences of black (#1342)

9b39a71

Fix random state not being used for sampling configurations (#1329)

88ad023

* Added random state to classifiers * Added some doc strings * Removed random_state again * flake'd * Fix some test issues * Re-added seed to test * Updated test doc for unknown test * flake'd

Changes show_models() function to return a dictionary of models in en…

84cabf0

…semble (#1321) * Changed show_models() function to return a dictionary of models in the ensemble instead of a string

Merge HOTFIX master 0.14.3 into dev

7252be6

Remove flaky dep (#1361)

b01c732

* Remove flaky dep * Remove unused pytest import

Fix: Make SimpleClassificationPipeline tests deterministic (#1366)

f5964ca

Fix: MLPRegressor tests (#1367)

b58be50

* Fix: MLPRegressor tests * Fix: Ordering of statements in test * Fix: MLP n_calls

Fix: imports from relative to absolute (#1370)

a9fbd5c

Fix: add error to be ignored during test (#1382)

b010058

Test changing the default output distribution for the quantile scaler (…

01fb3b5

…#1316)

eddiebergman and others added 8 commits June 17, 2022 14:26

fix-1527-Fix-mlp-regressor-test-fixture-values (#1528)

f0c8ecd

* Create PR * Update MLP regressor values

fix docker workflow (#1526)

4f691a1

* Make docker file install from `setup.py` * Add pytest cache to gitignore * Up timeouts on test_metadata_generation

fix-1535-Exception-in-the-fit()-call-of-AutoSklearn (#1539)

2764037

* Create PR * Fix test fixture

fix-1532-_ERROR_-asyncio.exceptions.CancelledError (#1540)

af9d469

* Create PR * Abstract out dask client types * Fix _ issue * Extend scope of dask_client in automl.py * Add docstring to dask module * Indent result addition * Add basic tests for Dask wrappers

eddiebergman added the maintenance Internal maintenance label Jul 16, 2022

eddiebergman added this to the v0.16 milestone Jul 16, 2022

eddiebergman self-assigned this Jul 16, 2022

eddiebergman added 16 commits July 16, 2022 18:03

Create PR

301d96f

Abstract out dask client types

bac6975

Fix _ issue

e56d262

Extend scope of dask_client in automl.py

a7fcffd

Add docstring to dask module

6eb30f7

Indent result addition

8a055b3

Use pyupgrade and update __init__

5b77dfc

progress

f971a33

update

aedb877

Rebase merge

356141e

Fix issues preventing mypy running

a7128e9

Rebase merge

21c4197

Fix 101 mypy issues

d1d3615

Last few mypy errors knocked out

b581bb8

Minor comment cleanups

80fc8f2

Fix reference to InputValidator

857a8c6

eddiebergman force-pushed the type_estimators branch from a921117 to 857a8c6 Compare July 16, 2022 22:08

Documentation

25f203d

eddiebergman force-pushed the development branch from d813838 to 259ed3d Compare August 18, 2022 18:14

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Type estimators #1542

Type estimators #1542

Uh oh!

eddiebergman commented Jul 16, 2022 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

Type estimators #1542

Are you sure you want to change the base?

Type estimators #1542

Uh oh!

Conversation

eddiebergman commented Jul 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Major points:

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

10 participants

eddiebergman commented Jul 16, 2022 •

edited

Loading