Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

When adding NoPreprocessing component to auto-sklearn, the lassoregression can run successfully, while the abess regression crashed #1661

Open
belzheng opened this issue Apr 15, 2023 · 3 comments

Comments

@belzheng
Copy link

Describe the bug

When adding NoPreprocessing component to auto-sklearn, the lassoregression can run successfully, while the abessregression crashed, both lassoregression and abessregression are written by my own, and they can both successfully run when NoPreprocessing not added. I wonder what is the problem that abessregression crashed when adding NoPreprocessing. The following are my snippest for reference:

1
2
3

Environment and installation:

Please give details about your installation:

  • OS linux
  • Is your installation in a virtual environment or conda environment? conda
  • Python version 3.8
  • Auto-sklearn version 0.15.0
@aron-bram
Copy link
Collaborator

Hi,
could you please also post your code for the custom preprocessor that you passed in?
Thanks in advance.

@belzheng
Copy link
Author

Ok, here is my code for debugging:

from autosklearn.pipeline.constants import SPARSE, DENSE, UNSIGNED_DATA, INPUT
class NoPreprocessing(AutoSklearnPreprocessingAlgorithm):
    def __init__(self, **kwargs):
        """This preprocessors does not change the data"""
        # Some internal checks makes sure parameters are set
        for key, val in kwargs.items():
            setattr(self, key, val)

    def fit(self, X, Y=None):
        return self

    def transform(self, X):
        return X

    @staticmethod
    def get_properties(dataset_properties=None):
        return {
            "shortname": "NoPreprocessing",
            "name": "NoPreprocessing",
            "handles_regression": True,
            "handles_classification": True,
            "handles_multiclass": True,
            "handles_multilabel": True,
            "handles_multioutput": True,
            "is_deterministic": True,
            "input": (SPARSE, DENSE, UNSIGNED_DATA),
            "output": (INPUT,),
        }

    @staticmethod
    def get_hyperparameter_search_space(
        feat_type: Optional[FEAT_TYPE_TYPE] = None, dataset_properties=None
    ):
        return ConfigurationSpace()  # Return an empty configuration as there is None


# Add NoPreprocessing component to auto-sklearn.
autosklearn.pipeline.components.data_preprocessing.add_preprocessor(NoPreprocessing)
cs = NoPreprocessing.get_hyperparameter_search_space()
print(cs)
class kBinsDiscretizer(AutoSklearnPreprocessingAlgorithm):
    def __init__(self, n_bins, random_state=None):
        self.n_bins = n_bins
        self.random_state = random_state
        self.preprocessor = None

    def fit(self, X, y=None):
        self.n_bins = int(self.n_bins)

        from sklearn.preprocessing import KBinsDiscretizer

        self.preprocessor = KBinsDiscretizer(
            n_bins = self.n_bins,
            )
        self.preprocessor.fit(X, y)
        return self

    def transform(self, X):
        if self.preprocessor is None:
            raise NotImplementedError()
        return self.preprocessor.transform(X)

    @staticmethod
    def get_properties(dataset_properties=None):
        return {
            "shortname": "kBinsDiscretizer",
            "name": "kBinsDiscretizer",
            "handles_regression": True,
            "handles_classification": True,
            "handles_multiclass": True,
            "handles_multilabel": True,
            "handles_multioutput": True,
            "is_deterministic": True,
            "input": (DENSE, UNSIGNED_DATA, SIGNED_DATA),
            "output": (DENSE, UNSIGNED_DATA, SIGNED_DATA),
        }

    @staticmethod
    def get_hyperparameter_search_space(
        feat_type: Optional[FEAT_TYPE_TYPE] = None, dataset_properties=None
    ):
        cs = ConfigurationSpace()
        n_bins = UniformIntegerHyperparameter(
            name="n_bins", lower=2, upper=10, default_value=5
        )
        cs.add_hyperparameters([n_bins])
        return cs


# Add kbins component to auto-sklearn.
autosklearn.pipeline.components.feature_preprocessing.add_preprocessor(kBinsDiscretizer)
cs = kBinsDiscretizer.get_hyperparameter_search_space()
print(cs)
class AbessRegression(AutoSklearnRegressionAlgorithm):

    def __init__(self, random_state=None):
        #self.exchange_num = exchange_num
        self.random_state = random_state
        self.estimator = None

    def fit(self, X, y):
        from abess import LinearRegression
        self.estimator = LinearRegression()
        self.estimator.fit(X, y)
        return self

    def predict(self, X):
        if self.estimator is None:
            raise NotImplementedError
        return self.estimator.predict(X)

    @staticmethod
    def get_properties(dataset_properties=None):
        return {
            'shortname': 'abess',
            'name': 'abess linear regression',
            'handles_regression': True,
            'handles_classification': False,
            'handles_multiclass': False,
            'handles_multilabel': False,
            'handles_multioutput': True,
            'is_deterministic': True,
            'input': (SPARSE, DENSE, UNSIGNED_DATA, SIGNED_DATA),
            'output': (PREDICTIONS,)
        }
    
    @staticmethod
    def get_hyperparameter_search_space(
        feat_type: Optional[FEAT_TYPE_TYPE] = None, dataset_properties=None
    ):
        cs = ConfigurationSpace() 
        #exchange_num=UniformIntegerHyperparameter(
         #   name='exchange_num', lower=4, upper=5, default_value=5
        #)
        #cs.add_hyperparameters([exchange_num])
        return cs
    
# Add abesscomponent to auto-sklearn.
autosklearn.pipeline.components.regression.add_regressor(AbessRegression)
cs = AbessRegression.get_hyperparameter_search_space()
print(cs)
regaallp = autosklearn.regression.AutoSklearnRegressor(
    time_left_for_this_task=60,
    per_run_time_limit=10,
    include={
        "data_preprocessor": ["NoPreprocessing"],
        'regressor': ['AbessRegression'],
        'feature_preprocessor':[
            'no_preprocessing',
            'polynomial',
            'kBinsDiscretizer',
        ],
    },
    memory_limit=6144,
)
regaallp.fit(X, y)
#yaallp_pred = regaallp.predict(X_test.values)

The error:
TypeError Traceback (most recent call last)
Cell In [10], line 15
1 regaallp = autosklearn.regression.AutoSklearnRegressor(
2 time_left_for_this_task=60,
3 per_run_time_limit=10,
(...)
13 memory_limit=6144,
14 )
---> 15 regaallp.fit(X, y)

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/autosklearn/estimators.py:1587, in AutoSklearnRegressor.fit(self, X, y, X_test, y_test, feat_type, dataset_name)
1576 raise ValueError(
1577 "Regression with data of type {} is "
1578 "not supported. Supported types are {}. "
(...)
1582 "".format(target_type, supported_types)
1583 )
1585 # Fit is supposed to be idempotent!
1586 # But not if we use share_mode.
-> 1587 super().fit(
1588 X=X,
1589 y=y,
1590 X_test=X_test,
1591 y_test=y_test,
1592 feat_type=feat_type,
1593 dataset_name=dataset_name,
1594 )
1596 return self

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/autosklearn/estimators.py:540, in AutoSklearnEstimator.fit(self, **kwargs)
538 if self.automl_ is None:
539 self.automl_ = self.build_automl()
--> 540 self.automl_.fit(load_models=self.load_models, **kwargs)
542 return self

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/autosklearn/automl.py:2394, in AutoMLRegressor.fit(self, X, y, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models)
2383 def fit(
2384 self,
2385 X: SUPPORTED_FEAT_TYPES,
(...)
2392 load_models: bool = True,
2393 ) -> AutoMLRegressor:
-> 2394 return super().fit(
2395 X,
2396 y,
2397 X_test=X_test,
2398 y_test=y_test,
2399 feat_type=feat_type,
2400 dataset_name=dataset_name,
2401 only_return_configuration_space=only_return_configuration_space,
2402 load_models=load_models,
2403 is_classification=False,
2404 )

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/autosklearn/automl.py:962, in AutoML.fit(self, X, y, task, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models, is_classification)
959 except Exception as e:
960 # This will be called before the _fit_cleanup
961 self._logger.exception(e)
--> 962 raise e
963 finally:
964 self._fit_cleanup()

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/autosklearn/automl.py:899, in AutoML.fit(self, X, y, task, X_test, y_test, feat_type, dataset_name, only_return_configuration_space, load_models, is_classification)
863 resamp_args = self._resampling_strategy_arguments
864 _proc_smac = AutoMLSMBO(
865 config_space=self.configuration_space,
866 dataset_name=self.dataset_name,
(...)
892 trials_callback=self.get_trials_callback,
893 )
895 (
896 self.runhistory
,
897 self.trajectory
,
898 self._budget_type,
--> 899 ) = _proc_smac.run_smbo()
901 trajectory_filename = os.path.join(
902 self._backend.get_smac_output_directory_for_run(self.seed),
903 "trajectory.json",
904 )
905 saveable_trajectory = [
906 list(entry[:2])
907 + [entry[2].get_dictionary()]
908 + list(entry[3:])
909 for entry in self.trajectory

910 ]

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/autosklearn/smbo.py:552, in AutoMLSMBO.run_smbo(self)
549 if self.trials_callback is not None:
550 smac.register_callback(self.trials_callback)
--> 552 smac.optimize()
554 self.runhistory = smac.solver.runhistory
555 self.trajectory = smac.solver.intensifier.traj_logger.trajectory

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/smac/facade/smac_ac_facade.py:720, in SMAC4AC.optimize(self)
718 incumbent = None
719 try:
--> 720 incumbent = self.solver.run()
721 finally:
722 self.solver.save()

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/smac/optimizer/smbo.py:273, in SMBO.run(self)
266 # Skip the run if there was a request to do so.
267 # For example, during intensifier intensification, we
268 # don't want to rerun a config that was previously ran
269 if intent == RunInfoIntent.RUN:
270 # Track the fact that a run was launched in the run
271 # history. It's status is tagged as RUNNING, and once
272 # completed and processed, it will be updated accordingly
--> 273 self.runhistory.add(
274 config=run_info.config,
275 cost=float(MAXINT)
276 if num_obj == 1
277 else np.full(num_obj, float(MAXINT)),
278 time=0.0,
279 status=StatusType.RUNNING,
280 instance_id=run_info.instance,
281 seed=run_info.seed,
282 budget=run_info.budget,
283 )
285 run_info.config.config_id = self.runhistory.config_ids[run_info.config]
287 self.tae_runner.submit_run(run_info=run_info)

File ~/miniconda3/envs/p38/lib/python3.8/site-packages/smac/runhistory/runhistory.py:257, in RunHistory.add(self, config, cost, time, status, instance_id, seed, budget, starttime, endtime, additional_info, origin, force_update)
223 """Adds a data of a new target algorithm (TA) run;
224 it will update data if the same key values are used
225 (config, instance_id, seed)
(...)
253 Forces the addition of a config to the history
254 """
256 if config is None:
--> 257 raise TypeError("Configuration to add to the runhistory must not be None")
258 elif not isinstance(config, Configuration):
259 raise TypeError(
260 "Configuration to add to the runhistory is not of type Configuration, but %s"
261 % type(config)
262 )

TypeError: Configuration to add to the runhistory must not be None

I would be very appreciateful as I have been troubled by this problem for a long time.

@aron-bram
Copy link
Collaborator

aron-bram commented Apr 17, 2023

Thanks for the extra info.

I ran your code on the diabetes dataset from sklearn (not sure what you used), and your custom feature/data preprocessors worked fine as expected.
When I ran with abess' linear regressor included, I managed to reproduce your error.

You said that abess regressor runs without problems if you only include your regressor in the search.
However, when I tried just including abess' linear regressor, the optimization still failed. In my case, the issue is with abass. Namely, I'm thinking that this open issue caused the sampled pipelines to crash.
Unfortunately, this is stopping me from further helping you with debugging, unless abess resolves it.

What you could try, is take a look at my answer to another raised issue of yours at #1660, where I also explain how to find the runhistory file. You should take a look at the sampled configurations there, and see if there are any errors attached to the runs. I would expect, that you would find some errors raised by abess there. If you do find errors raised by abess there, then this issue should be closed, since it is not related to autosklearn.

Hope this will help you solve it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants