-
Notifications
You must be signed in to change notification settings - Fork 1.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pip installed xgboost causing cpu to spike and crash #759
Comments
The following commands were written to file `test_code.py`:
from sklearn.metrics import make_scorer
#from xgboost import XGBRegressor
from tpot import TPOTRegressor
#import xgboost as xgb
import warnings
import pandas as pd
import math
warnings.filterwarnings('ignore')
# load in data
url = 'https://github.com/GinoWoz1/AdvancedHousePrices/raw/master/'
X_train = pd.read_csv(url + 'train_tpot_issue.csv')
y_train = pd.read_csv(url + 'y_train_tpot_issue.csv', header=None)
# loss function
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
if not (y_true >= 0).all() and not (y_pred >= 0).all():
print('error')
raise ValueError("Mean Squared Logarithmic Error cannot be used when "
"targets contain negative values.")
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
rmsle_loss = make_scorer(rmsle_loss,greater_is_better=False)
# run tpot
tpot = TPOTRegressor(verbosity=3, scoring=rmsle_loss, generations = 50,population_size=50,offspring_size= 50,max_eval_time_mins=10,warm_start=True, n_jobs=-1)
#
tpot.fit(X_train,y_train[0]) The scripts in the issue have bugs (about Since TPOT 0.9.4, there is a work-around for solving this issue with dask. Below is the example codes for this issue. # coding: utf-8
from sklearn.metrics import make_scorer
#from xgboost import XGBRegressor
from tpot import TPOTRegressor
#import xgboost as xgb
import warnings
import pandas as pd
import math
warnings.filterwarnings('ignore')
# load in data
url = 'https://github.com/GinoWoz1/AdvancedHousePrices/raw/master/'
X_train = pd.read_csv(url + 'train_tpot_issue.csv')
y_train = pd.read_csv(url + 'y_train_tpot_issue.csv', header=None)
# loss function
def rmsle_loss(y_true, y_pred):
assert len(y_true) == len(y_pred)
try:
terms_to_sum = [(math.log(y_pred[i] + 1) - math.log(y_true[i] + 1)) ** 2.0 for i,pred in enumerate(y_pred)]
except:
return float('inf')
if not (y_true >= 0).all() and not (y_pred >= 0).all():
raise ValueError("Mean Squared Logarithmic Error cannot be used when "
"targets contain negative values.")
return (sum(terms_to_sum) * (1.0/len(y_true))) ** 0.5
rmsle_loss = make_scorer(rmsle_loss,greater_is_better=False)
# run tpot
tpot = TPOTRegressor(verbosity=3, scoring = rmsle_loss, generations = 50,population_size=50,offspring_size= 50,max_eval_time_mins=10,warm_start=True, use_dask=True)
tpot.fit(X_train,y_train[0]) |
Thanks, so the scoring is the main cause of this issue? Sorry for the error in the codes, I just put that together really quick for something reproducible. The genetic algorithm finished on my laptop. First one that completed in 2 days after restarting numerous times. |
Yep, the customized scorer is not working with |
I just installed 0.9.4 and running your code I get the error below. File "", line 36, in File "PATH\Anaconda3\lib\site-packages\tpot\base.py", line 577, in fit AttributeError: 'TPOTRegressor' object has no attribute '_fit_init' |
Hmm, odds. Could you please try |
Thanks. I just did what you said and it uninstalled 0.9.4 and installed 0.9.5. Still same error. |
Hmm, I just tested the example in my Windows environment and it works. I am not sure what happened. Could you reproduce the error in other terminal or machine? Was the example running under folder with |
Thanks. I tested on my laptop on 0.9.3 and the fit function worked. I upgraded to 0.9.4 and the same error popped up with or without the dask argument. Also I uninstalled and installed 0.9.5 directly and had the same error. Also what did you mean by example running under folder tpot.py? I created a .py file in my github repository and then called on the script you provided. I am 1-2 years into python so excuse my ignorance. I never had this error show up until i upgraded. I installed python via anaconda and have the latest version. The file location of tpot is under the Anaconda lib folder. |
Hmm, can you please try to create a test environment via conda and then test the example? The commands in Powershell of Windows for create an conda environment are below:
|
Hmm, it seems that the anaconda environment is messed up. There are two possible solutions: 1st soluation: updating conda
2nd soluation: you could reinstall anaconda/miniconda in your environment. |
Thanks. I re-installed anaconda. When I DO NOT enter in use_dask=True, the fit function works. However, when I set use_dask = True I get the error below. I also tested removing max_eval time and then removing my scoring function and same error. Also the correct code would have y_train as y_train[1], how do I post a code block as you did? I am having trouble. |
Please check this cheatsheet for posting codes in markdown format. Or you can simply post the link of codes to your github repo as you did before. |
Thanks, any comment on my error? I only installed fancyimpute, rfpimp and missingno. Set my path and got this error. |
I am not sure what happened based on those error messages. So I need reproduce it with your update codes. |
I think the custom scorer is one issue but I am running n_jobs = 1 on only a 1000 row data set and TPOT cannot complete a full run. 50 generations, 50 population size. Is there a suggested editor you use that you dont have problems on? |
What editor are you using to run that script? Have you tried running it on the command line? Are you sure you don't have multiple Python/Anaconda installations? You can use |
Thanks @rhiever and @weixuanfu for the help so far. I am using spyder and the python installation is through the anaconda package. I have python installed in a virtual environment (for a website project) but besides that it is installed in GIMP and Microsoft office as it seems to have come with those packages. Regarding the command line, how do I call my own scoring function? Do I create .py file named 'Test' with a function 'score' and then call in 'Test.Score' to the command line? Sincerely, |
This seems to be a performance issue from xgboost I am failing on jupyter and command line too. I am attempting to feed in the config dict manually (from the source code) and inputing the n_threads = # of cores in my PC. Hoping this fixes it. Version=1 |
Yep, same issue. XGBboost blowing up my cpu even on multiple threads. I am perplexed. Searching the internet, I have seen this happen on others PCs as well. What is your cpu performance with xgboost @weixuanfu ? Does it spike or go above 50%? |
I am testing this on google cloud services now as it relates to the XGBoost performance problem. Aside from the XGBoost issue, when I pass a custom function with use_dask = True I get the error that we talked about before: "A Pipeline has not been optimized. Please call fit() first". This is outside of my pc and laptop so I am confused on why this is showing up. Here are the exact steps I took for my installation.
This is all I am doing and I get the error above when use_dask = True. |
Have you tried installing xgboost via conda? This sounds like an xgboost
issue.
On Tue, Sep 11, 2018 at 11:05 AM Justin ***@***.***> wrote:
I am testing this on google cloud services now.
Aside from the XGBoost issue, when I pass a custom function with use_dask
= True I get the error "A Pipeline has not been optimized. Please call
fit() first". This is outside of my pc and laptop so I am confused on why
this is showing up. Here are the exact steps I took for my installation.
1. Install Anaconda 3.6 for windows 64bit
2. pip Install missingno
3. pip install these .whl files manually (need to do so for
fancyimpute)
-ecos-2.0.5-cp36-cp36m-win_amd64.whl
- cvxpy-1.0.8-cp36-cp36m-win_amd64.whl
1. pip install fancimpute
2. pip install rfpimp (used for my custom functions import file)
3. pip install xgboost
4. pip install tpot
This is all I am doing and I get the error above when use_dask = True.
—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
<#759 (comment)>,
or mute the thread
<https://github.com/notifications/unsubscribe-auth/ABo7t2uBlELbCjuukMX1B-ccxoEvPEWoks5uZ_srgaJpZM4WY4sD>
.
--
Cheers,
Randal S. Olson, Ph.D.
E-mail: rso@randalolson.com | Twitter: @randal_olson
<https://twitter.com/randal_olson>
http://www.randalolson.com
|
Hey Randal, I did try but ran into some other side issues so ill test that again now in google cloud. On a side note, Google cloud services also crashed on xgboost process via the pip install. 4 cores, 2.3 ghz Intel Xeion(R) (Hyper-threaded) CPU. Just uninstalled the pip xgboost and installed py-xgboost via conda. I will update on results. |
Thanks for the help. pip installed xgboost is definitely the issue. Can you please close this @weixuanfu Solution: install xgboost via conda using 'conda install py-xgboost'. DO NOT install xgboost. I will open another issue regarding use_dask = True as it is still giving me the error "A pipeline has not yet been optimized". Sincerely, |
OK, please provide codes to reproduce the issue when |
Thanks. I will provide the code to reproduce. For dask, do I have to import it into the python script before running tpot? |
The 2nd way in tpot docs as below need import dask first but without setting from sklearn.externals import joblib
import distributed.joblib
from dask.distributed import Client
# connect to the cluster
client = Client('schedueler-address')
# create the estimator normally
estimator = TPOTClassifier(n_jobs=-1)
# perform the fit in this context manager
with joblib.parallel_backend("dask"):
estimator.fit(X, y)``` |
[provide general introduction to the issue and why it is relevant to this repository]
Hello, first of all, I am very happy with your tool but I am having issues getting it to finish. I would like to avoid using Tpot light if possible since i have the requisite processing speed. Also, I am new to posting bugs on this medium so apologies if I am missing anythin.
Context of the issue
[provide more detailed introduction to the issue itself and why it is relevant]
I am using spyder 3.2.6 on python 3.6 on a windows machine with the specs below. Also I have a slower laptop on 3.2.8 also on 3.6 experiencing the same issue.
Currently when I fit a pipeline on spyder the evolutionary algorithm will start. However, it constantly fails. What i have noticed quite a few times is that the CPU spikes to 100% usage and then the kernel dies.
[the remaining entries are only necessary if you are reporting a bug]
Process to reproduce the issue
[ordered list the process to finding and recreating the issue, example below]
PLEASE USE MY SCRIPT HERE to recreate - data and packages preloaded
https://github.com/GinoWoz1/AdvancedHousePrices/blob/master/TPOT_issue_fix.py
fit()
function with training data where n_jobs = -1 ; I have 6 core PC. I use 50 generations, 50 population size, 50 offspring size.Expected result
I would expect , with my specs, a full completion.
Current result
Crashes between 4-50% progress
Possible fix
Are there certain models in the pipeline that cause CPU to spike?
name of issue
screenshot[if relevant, include a screenshot]
The text was updated successfully, but these errors were encountered: