-
Notifications
You must be signed in to change notification settings - Fork 1.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
AutoML Regression Experiment Ends Unsuccessfully #6654
Comments
It's probably because your dataset is large and need more time to run (maybe your dataset has a lot of columns?) Or all trials's result is NaN.. You can set |
I have changed the threads to 1 simultaneously with 12 seconds and it worked.. the problem is that I have 2000 time series so the time to train will be 12*2000 seconds :\ and as well it produced worst score then the old automl (non nightly) with many threads that finished running after 20 minutes. |
@superichmann Let me see if I understand your question correctly: The exception goes away after you're using the latest nightly build. And it takes about 12 seconds for the first thread to finish. The problem is the training result is really bad for the first trial. And you don't want to increase the budget because you have too many datasets to run? |
The 12 seconds trial is single thread and it finishes training every time. the problem is that some of them just explored 1 model so I might need to raise the time. in general I would like to finish all of the training as fast as possible without damaging the accuracy so I am trying to run it through multiple threads but it might be a bad practice :/ what do you think? |
It's questionable if multi-thread training would help boost training speed. AFAIK some trainers already uses multi-thread to train a model so you might want to configure the max-parallel thread number carefully so as to make sure the total # of running thread won't exceed the thread number of CPU Meanwhile, in multi-thread case, it might be helpful to set budget using Also, may I ask more detailed information over 2000 time series dataset? Are they just different datasets and you want to use them to train 2000 different models for your project? |
Thanks @LittleLittleCloud for the valuable info! I am running on the max cpu (16) but as I wrote here it doesnt work well.. maybe I will reduce to 4 or 8. I am trying it out on this data from kaggle comp. its stores sales data which involves ~ 40 stores that each have ~ 40 departments.. I guess I should play with |
@superichmann I just expose the
forecastBySsa is used for univariate forecasting, while your dataset seems to be a multivariate one. So ForecastBySSA might not be the trainer that you're looking for. Considering that categorical features like Data pre-processingYou can transfer a time-serious forecasting problem to a regression problem via using previous sales data as features. For example if you want to predict the sale amount on day t, you can put sale amount on day t-1, t-2, .... t-n as features. Plus, adding meta-info like dayOfWeek, isPaymentDay, sale amount of relavent goods should also be helpful in predicting sale amount. Also, I like your idea of creating different model for different |
Thanks I am now experimenting with On some experiments, I see in my database (I Use After a check this particular hang caused by Still maybe it will be good to add a cache for the data from db.. |
@superichmann Maybe you can try cache dataset before sending to AutoML so it won't retrieve data from database for each trial?
|
from a fast experiment it works and also my runtime is faster 3 times :] the problem is that the scores are worsened :[ (I am talking about further test that I am running on later data by myself by fitting the final estimator to unseen data) Any explanation for that? Thanks again for your much help 💯
|
@superichmann Can you tell me how much worse it is. And in the meantime, can you share with us the best result && trainer when using old AutoML. |
I can't reproduce it for some reason :\ I will update when I could. Is there any other cache mechanism that I am not aware of? Is there anything I need to do when initializing mlcontext? I just create new() Another thing that happened, on |
@superichmann You can pass a seed in MLContext.
I notice that in your test dataset, all |
Closing since it seems the issue is resolved. |
System Information:
microsoft.ml\3.0.0-preview.23229.2
microsoft.ml.automl\0.21.0-preview.23229.2
microsoft.ml.onedal\0.21.0-preview.23229.2
updated from https://pkgs.dev.azure.com/dnceng/public/_packaging/dotnet-libraries/nuget/v3/index.json
Describe the bug
After updating to latest build through dotnet-libraries nuget in order to apply the fix for #6565 the same AutoML regression experiments that were completed in a 10 seconds are now incomplete, I have increased to 800 seconds and still receive the error :
RegressionMetric.MeanAbsoluteError
train: 1667 rows
test: 16 rows
Message:
If there are best practices for the new version please tell me what they are. maybe I am doing something wrong
Expected behavior
At least one model should be found?
Additional context
The long exception message is due to the fact that the trials are sent from Parallel.ForEach
The text was updated successfully, but these errors were encountered: