Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

First draft for user code introspection into components #1505

Closed
wants to merge 21 commits into from

Conversation

eddiebergman
Copy link
Contributor

@eddiebergman eddiebergman commented Jun 9, 2022

This is a first draft in an attempt to address #1429. Sample output:

from autosklearn.info import classifiers

name, info = next(iter(classifiers().items()))

print(name)
"adaboost"

print(info)
ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.adaboost.AdaboostClassifier'>, name='AdaBoost Classifier', shortname='AB', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=False)

The classifiers() returns dict[str, ClassifierInfo]

classifiers()

{'adaboost': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.adaboost.AdaboostClassifier'>, name='AdaBoost Classifier', shortname='AB', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=False),
 'bernoulli_nb': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.bernoulli_nb.BernoulliNB'>, name='Bernoulli Naive Bayes classifier', shortname='BernoulliNB', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'decision_tree': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.decision_tree.DecisionTree'>, name='Decision Tree Classifier', shortname='DT', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'extra_trees': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.extra_trees.ExtraTreesClassifier'>, name='Extra Trees Classifier', shortname='ET', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'gaussian_nb': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.gaussian_nb.GaussianNB'>, name='Gaussian Naive Bayes classifier', shortname='GaussianNB', output_kind='predictions', supported_inputs=['dense', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'gradient_boosting': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.gradient_boosting.GradientBoostingClassifier'>, name='Gradient Boosting Classifier', shortname='GB', output_kind='predictions', supported_inputs=['dense', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=False),
 'k_nearest_neighbors': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.k_nearest_neighbors.KNearestNeighborsClassifier'>, name='K-Nearest Neighbor Classification', shortname='KNN', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'lda': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.lda.LDA'>, name='Linear Discriminant Analysis', shortname='LDA', output_kind='predictions', supported_inputs=['dense', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'liblinear_svc': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.liblinear_svc.LibLinear_SVC'>, name='Liblinear Support Vector Classification', shortname='Liblinear-SVC', output_kind='predictions', supported_inputs=['sparse', 'dense', 'unsigned data'], deterministic=False, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'libsvm_svc': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.libsvm_svc.LibSVM_SVC'>, name='LibSVM Support Vector Classification', shortname='LibSVM-SVC', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=False),
 'mlp': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.mlp.MLPClassifier'>, name='Multilayer Percepton', shortname='MLP', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'multinomial_nb': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.multinomial_nb.MultinomialNB'>, name='Multinomial Naive Bayes classifier', shortname='MultinomialNB', output_kind='predictions', supported_inputs=['dense', 'sparse', 'signed data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'passive_aggressive': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.passive_aggressive.PassiveAggressive'>, name='Passive Aggressive Classifier', shortname='PassiveAggressive Classifier', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'qda': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.qda.QDA'>, name='Quadratic Discriminant Analysis', shortname='QDA', output_kind='predictions', supported_inputs=['dense', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'random_forest': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.random_forest.RandomForest'>, name='Random Forest Classifier', shortname='RF', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=True),
 'sgd': ClassifierInfo(type=<class 'autosklearn.pipeline.components.classification.sgd.SGD'>, name='Stochastic Gradient Descent Classifier', shortname='SGD Classifier', output_kind='predictions', supported_inputs=['dense', 'sparse', 'unsigned data'], deterministic=True, handles_binary=True, handles_multiclass=True, handles_multilabel=False)}

The same exists for all classifiers, regressors, data_preprocessors and feature_preprocessors

from autosklearn.info import components

# Same as the classifiers above
all_components = components()
classifiers = all_components.classfiers
feature_preprocessors = all_components.feature_preprocessors

Issues

  • Data preprocessors are their own beast, there's technically only one datapreprocessor component FeatTypeSplit

    • We've never come up with a nice solution to this as the pipeline is not flexible enough to really represent what we want
  • There's a lot of duplication existing already, i.e. handles_spares = True and input = (SPARSE, ...) in get_properties().

    • The dataclasses defined could just be used directly in each component. It makes it type safe, use . notation instead of ["this"].
    • Has the added benefit of really simplifying the stuff in autosklearn.info too.
    • Also means one source of truth for what's allowed in those properties
  • I would prefer to do what's below long term, it's a bit nicer to look at. The problem now is that if a user adds a custom component, I would like it to show up when they do components. This is fixable, just noting that it could be changed.

from autosklearn.info import components


# Current requires calling
for name, info in components().classfiers.items():
    # do stuff

# Note how components doesn't need to be called
for name, info in components.classfiers.items():
    # do stuff

@eddiebergman eddiebergman changed the title First draft First draft for user code introspection into components Jun 9, 2022
@codecov
Copy link

codecov bot commented Jun 9, 2022

Codecov Report

Merging #1505 (ca46861) into development (0f1f38a) will increase coverage by 0.01%.
The diff coverage is 14.28%.

❗ Current head ca46861 differs from pull request most recent head 59e60e1. Consider uploading reports for the commit 59e60e1 to get more accurate results

@@               Coverage Diff               @@
##           development    #1505      +/-   ##
===============================================
+ Coverage        83.79%   83.81%   +0.01%     
===============================================
  Files              152      154       +2     
  Lines            11667    11730      +63     
  Branches          2037     2049      +12     
===============================================
+ Hits              9776     9831      +55     
- Misses            1343     1355      +12     
+ Partials           548      544       -4     

Impacted file tree graph

* Push

* `fit_ensemble` now has priority for kwargs to take

* Change ordering of prefernce for ensemble params

* Add TODO note for metrics

* Add `metrics` arg to `fit_ensemble`

* Add test for pareto front sizes

* Remove uneeded file

* Re-added tests to `test_pareto_front`

* Add descriptions to test files

* Add test to ensure argument priority

* Add test to make sure X_data only loaded when required

* Remove part of test required for performance history

* Default to `self._metrics` if `metrics` not available
* Create simple example and doc for naive early stopping

* Fix doc, pass through SMAC callbacks directly

* Fix `isinstance` check

* Add test for early stopping

* Fix signature of early stopping example/test

* Fix doc build
dependabot bot and others added 17 commits June 14, 2022 17:05
Bumps [actions/setup-python](https://github.com/actions/setup-python) from 3 to 4.
- [Release notes](https://github.com/actions/setup-python/releases)
- [Commits](actions/setup-python@v3...v4)

---
updated-dependencies:
- dependency-name: actions/setup-python
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/download-artifact](https://github.com/actions/download-artifact) from 2 to 3.
- [Release notes](https://github.com/actions/download-artifact/releases)
- [Commits](actions/download-artifact@v2...v3)

---
updated-dependencies:
- dependency-name: actions/download-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [codecov/codecov-action](https://github.com/codecov/codecov-action) from 2 to 3.
- [Release notes](https://github.com/codecov/codecov-action/releases)
- [Changelog](https://github.com/codecov/codecov-action/blob/master/CHANGELOG.md)
- [Commits](codecov/codecov-action@v2...v3)

---
updated-dependencies:
- dependency-name: codecov/codecov-action
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/upload-artifact](https://github.com/actions/upload-artifact) from 2 to 3.
- [Release notes](https://github.com/actions/upload-artifact/releases)
- [Commits](actions/upload-artifact@v2...v3)

---
updated-dependencies:
- dependency-name: actions/upload-artifact
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Fix logging server cleanup

* Add comment relating to the `try: finally:`

* Remove nested try: except: from `fit`
Bumps [peter-evans/find-comment](https://github.com/peter-evans/find-comment) from 1 to 2.
- [Release notes](https://github.com/peter-evans/find-comment/releases)
- [Commits](peter-evans/find-comment@v1...v2)

---
updated-dependencies:
- dependency-name: peter-evans/find-comment
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
Bumps [actions/stale](https://github.com/actions/stale) from 4 to 5.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v4...v5)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Init commit

* Fix logging server cleanup (#1503)

* Fix logging server cleanup

* Add comment relating to the `try: finally:`

* Remove nested try: except: from `fit`

* Bump peter-evans/find-comment from 1 to 2 (#1520)

Bumps [peter-evans/find-comment](https://github.com/peter-evans/find-comment) from 1 to 2.
- [Release notes](https://github.com/peter-evans/find-comment/releases)
- [Commits](peter-evans/find-comment@v1...v2)

---
updated-dependencies:
- dependency-name: peter-evans/find-comment
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Bump actions/stale from 4 to 5 (#1521)

Bumps [actions/stale](https://github.com/actions/stale) from 4 to 5.
- [Release notes](https://github.com/actions/stale/releases)
- [Changelog](https://github.com/actions/stale/blob/main/CHANGELOG.md)
- [Commits](actions/stale@v4...v5)

---
updated-dependencies:
- dependency-name: actions/stale
  dependency-type: direct:production
  update-type: version-update:semver-major
...

Signed-off-by: dependabot[bot] <support@github.com>

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>

* Init commit

* Update evaluation module

* Clean up other occurences of the word validation

* Re-add test for test predictions

Co-authored-by: dependabot[bot] <49699333+dependabot[bot]@users.noreply.github.com>
* Add debug statements and 30s timeouts

* Fix formatting

* Update internal timeout param

* +timeout, use allocated tmpdir

* +timeout, use allocated tmpdir

* Remove another occurence of explicit `tmp`

* Increase timelimits once again

* Remove incomplete comment
* Init commit

* Fix DummyClassifiers in _load_pareto_set

* Add test for dummy only in classifiers

* Update no ensemble docstring

* Add automl case where automl only has dummy

* Remove tmp file

* Fix `include` statement to be regressor
* Create PR

* Update MLP regressor values
* Make docker file install from `setup.py`

* Add pytest cache to gitignore

* Up timeouts on test_metadata_generation
@eddiebergman eddiebergman deleted the document_model_capabilities branch June 24, 2022 14:12
@eddiebergman eddiebergman restored the document_model_capabilities branch June 24, 2022 14:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Can Autosklearn handle Multi-Class/Multi-Label Classification and which classifiers will it use?
2 participants