Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

How to decide value of time_budget? #155

Closed
dsbyprateekg opened this issue Aug 8, 2021 · 3 comments · Fixed by #150
Closed

How to decide value of time_budget? #155

dsbyprateekg opened this issue Aug 8, 2021 · 3 comments · Fixed by #150

Comments

@dsbyprateekg
Copy link

Hi,

Is there any way we can find the best value for the time_budget as per our dataset?
Please share some tips.

@qingyun-wu
Copy link
Contributor

Hi @dsbyprateekg,

Thanks for your interest and your question. There is in general no absolute answer to this question, but here are some tips that might be useful:

  1. Use the AutoML benchmark as a reference and decide your time budget accordingly.
     
    This AutoML benchmark has a large and diverse collection of datasets (from openml). It categorizes the datasets into ‘small’, ‘medium’, and ‘large’ (Find the lists in three categories here). according to the size of the datasets. 

The time budget used in the AutoML benchmark are as follows:

 ‘small’: 1h-4h,
 ‘medium’: 4h-8h,
 ‘large’: 8h,

According to the results in the FLAML paper, the time needed to reach or surpass the best performance reported in the AutoML benchmark can be greatly reduced if FLAML is used. The time needed for different dataset categories are as follows,

 ‘small’: 1m-10m,
‘medium’: 10m-1h,
‘large’: 1h-4h,

You can use the AutoML benchmark as a reference to decide which category your dataset belongs to. And use the suggested time budget (especially in terms of order of magnitude) mentioned above accordingly.

  1. In the next version of FLAML v0.5.12 #150, FLAMl will output warnings to keep users informed about how complete the search is, which can be used as a reference on how to adjust the time budget: (a) if we believe increasing time budget will very likely further improve the results; (b) if the performance is not improved for a long time, which indicates that the current time budget is long enough (and you may even want to reduce the time budget). V0.5.12 be released soon. Stay tuned :)

  2. Finally, you may definitely want to combine the suggested time budget with the max budget allowed (if there is any) in your use cases.

In addition, it will be great if you could share more information about your use cases. We might be able to provide more accurate answers/suggestions accordingly.

Thanks!
Qingyun

@dsbyprateekg
Copy link
Author

dsbyprateekg commented Aug 9, 2021

Thanks a lot, @qingyun-wu and your entire team for this amazing work.
The next release of FLAML will definitely help us to improve the model performance.

I am testing FLAML for the first time for a challenge and the datasets have the following sizes:
train.csv: 22083 x 45
test.csv: 9465 x 43

I have two target columns to predict and the scoring I am using is as below:
`

Target1

score1 = max(0, 100*metrics.f1_score(actual["Target1"], predicted["Target1"], average="macro"))

Target2

score2 = max(0, 100*metrics.f1_score(actual["Target2"], predicted["Target2"], average="macro"))

Final score

score = (score1/2)+(score2/2)`

It seems my dataset is small. I have tried with "time_budget":120 and it improved my score from 33.71824 to 34.71006.
Here I am attaching the datasets and my notebook for your suggestions to improve the score more, maybe with your help, I can achieve 38+.

test.csv
train.csv
predict_genetic_disorder_ensemble.txt

@sonichi sonichi linked a pull request Aug 12, 2021 that will close this issue
@dsbyprateekg
Copy link
Author

Thanks @sonichi , with this new version 0.5.12, I am able to see a message in the console suggesting to increase the time budget.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants