How to decide value of time_budget? #155

dsbyprateekg · 2021-08-08T10:07:20Z

Hi,

Is there any way we can find the best value for the time_budget as per our dataset?
Please share some tips.

qingyun-wu · 2021-08-08T18:24:53Z

Thanks for your interest and your question. There is in general no absolute answer to this question, but here are some tips that might be useful:

Use the AutoML benchmark as a reference and decide your time budget accordingly.

This AutoML benchmark has a large and diverse collection of datasets (from openml). It categorizes the datasets into ‘small’, ‘medium’, and ‘large’ (Find the lists in three categories here). according to the size of the datasets.

The time budget used in the AutoML benchmark are as follows:

‘small’: 1h-4h,
‘medium’: 4h-8h,
‘large’: 8h,

According to the results in the FLAML paper, the time needed to reach or surpass the best performance reported in the AutoML benchmark can be greatly reduced if FLAML is used. The time needed for different dataset categories are as follows,

‘small’: 1m-10m,
‘medium’: 10m-1h,
‘large’: 1h-4h,

You can use the AutoML benchmark as a reference to decide which category your dataset belongs to. And use the suggested time budget (especially in terms of order of magnitude) mentioned above accordingly.

In the next version of FLAML v0.5.12 #150, FLAMl will output warnings to keep users informed about how complete the search is, which can be used as a reference on how to adjust the time budget: (a) if we believe increasing time budget will very likely further improve the results; (b) if the performance is not improved for a long time, which indicates that the current time budget is long enough (and you may even want to reduce the time budget). V0.5.12 be released soon. Stay tuned :)
Finally, you may definitely want to combine the suggested time budget with the max budget allowed (if there is any) in your use cases.

In addition, it will be great if you could share more information about your use cases. We might be able to provide more accurate answers/suggestions accordingly.

Thanks!
Qingyun

dsbyprateekg · 2021-08-09T03:46:35Z

Thanks a lot, @qingyun-wu and your entire team for this amazing work.
The next release of FLAML will definitely help us to improve the model performance.

I am testing FLAML for the first time for a challenge and the datasets have the following sizes:
train.csv: 22083 x 45
test.csv: 9465 x 43

I have two target columns to predict and the scoring I am using is as below:
`

Target1

score1 = max(0, 100*metrics.f1_score(actual["Target1"], predicted["Target1"], average="macro"))

Target2

score2 = max(0, 100*metrics.f1_score(actual["Target2"], predicted["Target2"], average="macro"))

Final score

score = (score1/2)+(score2/2)`

It seems my dataset is small. I have tried with "time_budget":120 and it improved my score from 33.71824 to 34.71006.
Here I am attaching the datasets and my notebook for your suggestions to improve the score more, maybe with your help, I can achieve 38+.

test.csv
train.csv
predict_genetic_disorder_ensemble.txt

dsbyprateekg · 2021-08-12T10:08:05Z

Thanks @sonichi , with this new version 0.5.12, I am able to see a message in the console suggesting to increase the time budget.

sonichi linked a pull request Aug 12, 2021 that will close this issue

v0.5.12 #150

Merged

dsbyprateekg closed this as completed Aug 12, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

How to decide value of time_budget? #155

How to decide value of time_budget? #155

dsbyprateekg commented Aug 8, 2021

qingyun-wu commented Aug 8, 2021

dsbyprateekg commented Aug 9, 2021 •

edited

Loading

dsbyprateekg commented Aug 12, 2021

How to decide value of time_budget? #155

How to decide value of time_budget? #155

Comments

dsbyprateekg commented Aug 8, 2021

qingyun-wu commented Aug 8, 2021

dsbyprateekg commented Aug 9, 2021 • edited Loading

Target1

Target2

Final score

dsbyprateekg commented Aug 12, 2021

dsbyprateekg commented Aug 9, 2021 •

edited

Loading