Only dummy predictions in custom metric #1639

konstantin-doncov · 2023-01-06T15:52:24Z

I want to use my own metric, but I get a lot of troubles during implementing this. Many of them the are related to each other. So, I hope I will solve all of them.
E.g. if I use this code with 5 minutes max runtime(time_left_for_this_task=5*60):

def metric_which_needs_x(solution, prediction, X_data):
  print(prediction)
  print(len(X_data))
  return 1

accuracy_scorer = askl.metrics.make_scorer(
    name="accu_X",
    score_func=metric_which_needs_x,
    optimum=1,
    greater_is_better=True,
    needs_proba=True,
    needs_X=True,
    needs_threshold=False
)


automl = askl.classification.AutoSklearnClassifier(
  ensemble_size = 1,
    time_left_for_this_task=5*60,
    per_run_time_limit=5*60,
    metric=accuracy_scorer,
    resampling_strategy=logo,
    resampling_strategy_arguments={"groups": groups}
)
automl.fit(x, y)

Then all fine and my metric function gets real predictions(not 0.5 0.5):

:3: DeprecationWarning: ensemble_size has been deprecated, please use ensemble_kwargs = {'ensemble_size': 1}. Inserting ensemble_size into ensemble_kwargs for now. ensemble_size will be removed in auto-sklearn 0.16.
automl = askl.classification.AutoSklearnClassifier(
[WARNING] [2023-01-06 15:18:13,967:Client-AutoML(1):52f808ae-8dd5-11ed-840e-0242ac1c000c] Time limit for a single run is higher than total time limit. Capping the limit for a single run to the total time given to SMAC (294.777947)
[WARNING] [2023-01-06 15:18:13,967:Client-AutoML(1):52f808ae-8dd5-11ed-840e-0242ac1c000c] Capping the per_run_time_limit to 147.0 to have time for a least 2 models in each process.
[WARNING] [2023-01-06 15:18:14,003:Client-AutoMLSMBO(1)::52f808ae-8dd5-11ed-840e-0242ac1c000c] Could not find meta-data directory /usr/local/lib/python3.8/dist-packages/autosklearn/metalearning/files/accu_X_binary.classification_dense
[[0.5 0.5]
[0.5 0.5]
[0.5 0.5]
...
[0.5 0.5]
[0.5 0.5]
[0.5 0.5]]
227226
[WARNING] [2023-01-06 15:20:20,378:Client-EnsembleBuilder] No runs were available to build an ensemble from
[[0.5 0.5]
[0.5 0.5]
[0.5 0.5]
...
[0.5 0.5]
[0.5 0.5]
[0.5 0.5]]
227226
[[0.5 0.5]
[0.5 0.5]
[0.5 0.5]
...
[0.5 0.5]
[0.5 0.5]
[0.5 0.5]]
227226
[[0.2602794 0.7397206 ]
[0.2947102 0.7052898 ]
[0.26641977 0.73358023]
...
[0.8857727 0.11422727]
[0.83059156 0.16940844]
[0.8350615 0.16493851]]
227226
[WARNING] [2023-01-06 15:22:11,886:Client-EnsembleBuilder] No models better than random - using Dummy losses!
Models besides current dummy model: 0
Dummy models: 1
[WARNING] [2023-01-06 15:22:11,930:smac.runhistory.runhistory2epm.RunHistory2EPM4LogCost] Got cost of smaller/equal to 0. Replace by 0.000010 since we use log cost.
[[0.2602794 0.7397206 ]
[0.2947102 0.7052898 ]
[0.26641977 0.73358023]
...
[0.8857727 0.11422727]
[0.83059156 0.16940844]
[0.8350615 0.16493851]]
227226
[WARNING] [2023-01-06 15:22:53,545:Client-EnsembleBuilder] No models better than random - using Dummy losses!
Models besides current dummy model: 0
Dummy models: 1
[WARNING] [2023-01-06 15:22:53,608:smac.runhistory.runhistory2epm.RunHistory2EPM4LogCost] Got cost of smaller/equal to 0. Replace by 0.000010 since we use log cost.

But if I use 4 minutes max runtime, then I get only dummy predictions(only 0.5 0.5).

You may say 'Well, then just use more time', but this is not a cure. Because when I use more complicated and time consuming (like 1-2 minute for one metric run) metrics, then it's not enough even one hour(and I don't know how much time it takes).
So, how can I fix this?

The text was updated successfully, but these errors were encountered:

eddiebergman · 2023-01-11T15:21:09Z

Sorry for the delay, I think the solution here is actually remove it all together. The logs say it only gets to try two models and both are worse than the dummy model, it seems like it needs to try more of them.

It could be that the logo resampling strategy (which I guess is Leave One Group Out) could be creating many subsets of data which just means there's just too much data to fit if the number of groups is too much. Say for example you have 1_000_000 samples with 10 groups. My impression of logo is that you would need to fit 10 models each on 900_000 equaling 9_000_000 data points total to get one model evaluation. This gets amplified more as the number of groups increases. Have you tried simple holdout just to test this hypothesis?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Only dummy predictions in custom metric #1639

Only dummy predictions in custom metric #1639

konstantin-doncov commented Jan 6, 2023 •

edited by eddiebergman

Loading

eddiebergman commented Jan 11, 2023

Only dummy predictions in custom metric #1639

Only dummy predictions in custom metric #1639

Comments

konstantin-doncov commented Jan 6, 2023 • edited by eddiebergman Loading

eddiebergman commented Jan 11, 2023

konstantin-doncov commented Jan 6, 2023 •

edited by eddiebergman

Loading