-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Use Dask= True or manual import method not working #764
Comments
Hmm, I tested those codes under a fresh test conda environment and the error was not reproduced. But I used a easy way to install conda create -n test_env python=3.6
activate test_env
pip install missingno
conda install -y -c anaconda ecos
conda install -y -c conda-forge lapack
conda install -y -c cvxgrp cvxpy
conda install -y -c cimcb fancyimpute
pip install rfpimp
conda install -y py-xgboost
pip install tpot msgpack dask[delayed] dask-ml Another suggestion about the customized scorer in your codes. May it will be more stable if the function does not raise def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
return float('inf')
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5 |
Thanks Weixuan. Quick question, how do I run the python script out of the conda environment? I am just used to opening up the script on my desktop and running it there. |
Nevermind on the python script question. I was able to setup on my laptop. Any idea why this install process breaks the verbosity argument? everything else seems to be working fine, thanks a ton for your help. Sincerely, |
You're welcome. Do you mean no confirmation during installation of packages via conda? If so, the |
The progress bar doesnt show up. |
Hmm, I think progress bar should be not easy to catch with tons of warning messages when We need refine this warning message action when
|
Thanks, no problem. I can live without it for now just as long as the periodic checkpoints are being saved. You can close this. thanks again! |
Hmm, did you also update
|
Thanks I did. Sorry for the bother, it looks like a user error on my side
with my virtual environment. Really hate to inconvenience you. I am going
to do a overview of TPOT soon for some individuals in my area at a meetup
so this will help greatly! I'll make sure to give a shout out to your and
your team.
Sincerely,
Justin
…On Fri, Sep 14, 2018 at 8:11 AM Weixuan Fu ***@***.***> wrote:
Hmm, did you also update rmsle_loss in your codes? Can you provide a
random_state to reproduce the issue?
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
return float('inf')
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
<#764 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/AQuRcbdaaedW_zFed3k6VD2AwKi7dBigks5ua8cTgaJpZM4WmMOo>
.
|
Hey Weixuan, With the same exact setup, I am now getting the error below. Any idea? I am unable to get TPOT to finish a single run. conda create -n test_env python=3.6 |
Hmm it seems a xgboost API issue. I tried to reproduce this issue via the demo below but the error didn't show up. I think I recently updated xgboost to 0.80 via from sklearn.metrics import make_scorer
from tpot import TPOTRegressor
import warnings
import pandas as pd
from sklearn.datasets import load_boston
from sklearn.model_selection import train_test_split
import math
warnings.filterwarnings('ignore')
housing = load_boston()
X_train, X_test, y_train, y_test = train_test_split(housing.data, housing.target,
train_size=0.75, test_size=0.25)
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
return float('inf')
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
tpot = TPOTRegressor(verbosity=3, scoring = rmsle_loss, generations = 50,population_size=50,offspring_size= 50,max_eval_time_mins=10,warm_start=True, use_dask=True)
tpot.fit(X_train,y_train) |
I got the same issue. I can't use conda environment. Whenever i use "use_dask=True", i get the following error : RuntimeError: A pipeline has not yet been optimized. Please call fit() first. tpot = TPOTRegressor(verbosity=3, scoring = rmsle_loss, generations = 50,population_size=50,offspring_size= 50,max_eval_time_mins=10,warm_start=True, use_dask=True) RuntimeError: A pipeline has not yet been optimized. Please call fit() first. I have tried on azure databricks cluster as well as on my local machine |
@GuillaumeLab which version of |
dask 2.24.0 . Thanks for your answer. I checked this thread : "dask/distributed#2297"", and it does not really help solve the issue. Tpot is working fine on a single device, no memory issue. Why distributing it on several devices would cause a memory issue? |
[provide general introduction to the issue and why it is relevant to this repository]
I cannot use multiple cores and therefore my jobs are running extremely slow.
Context of the issue
In 0.9.4 a fix was put in to use use_dask = True or to import manually. Both methods return the error
"
File "C:\Users\jstnjc\Anaconda3\lib\site-packages\tpot\base.py", line 684, in fit
self._update_top_pipeline()
File "C:\Users\jstnjc\Anaconda3\lib\site-packages\tpot\base.py", line 758, in _update_top_pipeline
raise RuntimeError('A pipeline has not yet been optimized. Please call fit() first.')
RuntimeError: A pipeline has not yet been optimized. Please call fit() first."
Process to reproduce the issue
(ive tested this on 3 different computers including a cloud service)
Install Anaconda 3.6 for windows 64bit
pip Install missingno
pip install these .whl files manually (need to do so for fancyimpute)
-ecos-2.0.5-cp36-cp36m-win_amd64.whl
-cvxpy-1.0.8-cp36-cp36m-win_amd64.whl
pip install fancimpute
pip install rfpimp (used for my custom functions import file)
conda install py-xgboost
pip install tpot
pip install msgpack
pip install dask[delayed] dask-ml
[ordered list the process to finding and recreating the issue, example below]
With the above - execute the code below:
Expected result
Expect the process to run and to use all cores.
Current result
[describe what you currently experience from this process, and thereby explain the bug]
The text was updated successfully, but these errors were encountered: