Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests for the AutoML class relying on is_classification=false even when it is a classificaiton task, crash when corrected #1212

Open
eddiebergman opened this issue Aug 8, 2021 · 0 comments
Labels
bug maintenance Internal maintenance

Comments

@eddiebergman
Copy link
Contributor

eddiebergman commented Aug 8, 2021

The is_classificaiton parameter for AutoML.fit defaults to false in this test even though it is indeed a classification problem.

# The test as is
def test_fit(...)
    ... # iris dataset
    automl.fit(
        X_train, Y_train, task=MULTICLASS_CLASSIFICATION,
    )
    ....

Changing the fit to the following causes a test error, even though it is a classification task:

# Adding is_classification=True causes the test to fail
automl.fit(
    X_train, Y_train, task=MULTICLASS_CLASSIFICATION, is_classification=True
)

Extra Context:
Automl.fit creates an InputValidator that is the only thing to use the is_classification param. This is defaulted to false unless explicitly passed.

        self.InputValidator = InputValidator(
            is_classification=is_classification,
            feat_type=feat_type,
            logger_port=self._logger_port,
        )
        self.InputValidator.fit(X_train=X, y_train=y, X_test=X_test, y_test=y_test)
        X, y = self.InputValidator.transform(X, y)

Error:

___________________________________ test_fit ___________________________________

dask_client = <Client: 'inproc://192.168.178.28/175589/1' processes=2 threads=2, memory=7.69 GiB>

    def test_fit(dask_client):

        X_train, Y_train, X_test, Y_test = putil.get_dataset('iris')
        automl = autosklearn.automl.AutoML(
            time_left_for_this_task=30,
            per_run_time_limit=5,
            metric=accuracy,
            dask_client=dask_client,
        )
        automl.fit(
            X_train, Y_train, task=MULTICLASS_CLASSIFICATION, is_classification=True
        )
>       score = automl.score(X_test, Y_test)

test/test_automl/test_automl.py:62:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
autosklearn/automl.py:1430: in score
    prediction = self.InputValidator.target_validator.transform(prediction)
autosklearn/data/target_validator.py:235: in transform
    y = self.encoder.transform(y)
.venv/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:805: in transform
    X_int, X_mask = self._transform(X, handle_unknown=self.handle_unknown)
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

self = OrdinalEncoder(handle_unknown='use_encoded_value', unknown_value=-1)
X = array([[0.99609375, 0.00390625, 0.        ],
       [1.        , 0.        , 0.        ],
       [0.99609375, 0.003906...12, 0.15429688, 0.        ],
       [0.        , 0.9609375 , 0.0390625 ],
       [0.        , 0.        , 1.        ]])
handle_unknown = 'use_encoded_value', force_all_finite = True

    def _transform(self, X, handle_unknown='error', force_all_finite=True):
        X_list, n_samples, n_features = self._check_X(
            X, force_all_finite=force_all_finite)

        X_int = np.zeros((n_samples, n_features), dtype=int)
        X_mask = np.ones((n_samples, n_features), dtype=bool)

        if n_features != len(self.categories_):
>           raise ValueError(
                "The number of features in X is different to the number of "
                "features of the fitted data. The fitted data had {} features "
                "and the X has {} features."
                .format(len(self.categories_,), n_features)
            )
E           ValueError: The number of features in X is different to the number of features of the fitted data. The fitted data had 1 features and the X has 3 features.

.venv/lib/python3.9/site-packages/sklearn/preprocessing/_encoders.py:120: ValueError
---------------------------- Captured stdout setup -----------------------------
Started Dask client=<Client: 'inproc://192.168.178.28/175589/1' processes=2 threads=2, memory=7.69 GiB>
--------------------------- Captured stdout teardown ---------------------------
Closed Dask client=<Client: 'inproc://192.168.178.28/175589/1' processes=2 threads=2, memory=7.69 GiB>

@eddiebergman eddiebergman added the maintenance Internal maintenance label Aug 8, 2021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug maintenance Internal maintenance
Projects
None yet
Development

No branches or pull requests

1 participant