Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error Generating Pipeline Code during Fit #233

Closed
KeithBrodie opened this issue Aug 22, 2016 · 12 comments
Closed

Error Generating Pipeline Code during Fit #233

KeithBrodie opened this issue Aug 22, 2016 · 12 comments

Comments

@KeithBrodie
Copy link

While fitting an X, y pair presented as pandas dataframes TPOT crashed in generate pipeline code.

Context of the issue

Ubuntu 16.04 LTS
Python 2.7.12
deap 1.02
TPOT 0.5.0

Code is a copy of the example code from the documentation replacing the dataset with one generated locally.

#import pandas as pd
from BuildXY import BuildXY
from tpot import TPOT

Xdf,ydf = BuildXY('aapl')

Dates = Xdf.Date.values

Xdf.drop(u'Date',axis=1,inplace = True)

ydf.drop(u'Date',axis=1,inplace = True)

X_train = Xdf[:-100]
y_train = ydf[:-100]
X_test  = Xdf[-100:]
y_test  = ydf[-100:]

pipeopt = TPOT(generations=5, 
               population_size=20, 
               num_cv_folds=5, 
               random_state=42, 
               verbosity=2)

pipeopt.fit(X_train, y_train)
print(pipeopt.score(X_test, y_test))
pipeopt.export('tpot_exported_pipeline.py')

Process to reproduce the issue

  1. User creates TPOT instance
  2. User calls TPOT fit() function with training data
  3. TPOT crashes with a TypeError

Expected result

Fit method to complete without crashing

Current result

GP Progress: 0%| | 0/120 [00:00<?, ?pipeline/s]
GP Progress: 10%|# | 12/120 [00:00<00:00, 118.37pipeline/s]
GP Progress: 13%|#3 | 16/120 [1:19:04<10:16:46, 355.83s/pipeline]
GP Progress: 14%|#4 | 17/120 [1:19:22<7:16:51, 254.48s/pipeline]
GP Progress: 15%|#5 | 18/120 [1:19:23<5:03:10, 178.34s/pipeline]
GP Progress: 16%|#5 | 19/120 [1:22:02<4:50:39, 172.67s/pipeline]
GP Progress: 17%|#6 | 20/120 [1:22:02<3:21:31, 120.91s/pipeline]

Traceback (most recent call last):
File "/home/northwood/Dropbox/AutoDex/Extractor/tp1.py", line 31, in
pipeopt.fit(X_train, y_train)
File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 307, in fit
self._fitted_pipeline = self._toolbox.compile(expr=self._optimized_pipeline)
File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 431, in _compile_to_sklearn
sklearn_pipeline = generate_pipeline_code(expr_to_tree(expr))
File "/usr/local/lib/python2.7/dist-packages/tpot/export_utils.py", line 80, in expr_to_tree
for node in ind:
TypeError: 'NoneType' object is not iterable

Possible fix

I don't know.

Screenshot

image

@danthedaniel
Copy link
Contributor

I'm trying to replicate your issue, but I'm not sure where you're getting the BuildXY package from. Is that something that's bundled with the Spyder IDE?

@KeithBrodie
Copy link
Author

No, something I wrote. I will post the output dataframes. I'm re-running
it passing X and y as numpy arrays. Behavior is different. Don't know yet
if it runs to completion.

Thanks for looking at it

Keith

On Aug 22, 2016 9:11 AM, "Daniel" notifications@github.com wrote:

I'm trying to replicate your issue, but I'm not sure where you're getting
the BuildXY package from. Is that something that's bundled with the
Spyder IDE?


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#233 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFz0r9Z4LOZN_CGfn0qm2S7F5ffg7j-Wks5qicoqgaJpZM4Jp_Vn
.

@KeithBrodie
Copy link
Author

Ok, replicated the error with a smaller dataset, this time on Windows. Including a reduced data-set that also generates the error.

Code:

import pandas as pd
#from BuildXY import BuildXY
from tpot import TPOT

Xdf = pd.read_csv('X2.csv',index_col=None)
ydf = pd.read_csv('y2.csv',index_col=None)

Dates = Xdf.Date.values

Xdf.drop(u'Date',axis=1,inplace = True)

ydf.drop(u'Date',axis=1,inplace = True)

print (Xdf.columns)
print (ydf.columns)

X_train = Xdf[:-100].values
y_train = ydf[:-100].values
X_test  = Xdf[-100:].values
y_test  = ydf[-100:].values

pipeopt = TPOT(generations=5, 
               population_size=20, 
               num_cv_folds=5, 
               random_state=42, 
               verbosity=2)

pipeopt.fit(X_train, y_train)
print(pipeopt.score(X_test, y_test))
pipeopt.export('tpot_exported_pipeline.py')

Result:

Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

runfile('C:/Users/Keith/Dropbox/AutoDex/Extractor/tp4.py', wdir='C:/Users/Keith/Dropbox/AutoDex/Extractor')
Index([u'DLR_P0', u'DLR_P1', u'DLR_P2', u'DLOC_P0', u'DLOC_P1', u'DLOC_P2',
u'LVOL_P0', u'LVOL_P1', u'LVOL_P2', u'LMA5_P0', u'LMA5_P1', u'LMA5_P2',
u'LSD5_P0', u'LSD5_P1', u'LSD5_P2', u'ARO10_P0', u'ARO10_P1',
u'ARO10_P2'],
dtype='object')
Index([u'Target'], dtype='object')

GP Progress: 0%| | 0/120 [00:00<?, ?pipeline/s]
GP Progress: 4%|? | 5/120 [00:01<00:34, 3.36pipeline/s]
GP Progress: 5%|¦ | 6/120 [03:01<1:42:50, 54.13s/pipeline]
GP Progress: 11%|¦ | 13/120 [03:01<1:07:34, 37.89s/pipeline]
GP Progress: 12%|¦? | 15/120 [06:09<1:35:46, 54.73s/pipeline]
GP Progress: 13%|¦? | 16/120 [06:09<1:06:28, 38.35s/pipeline]
GP Progress: 16%|¦¦ | 19/120 [09:16<1:16:42, 45.57s/pipeline]
GP Progress: 17%|¦? | 20/120 [09:16<53:14, 31.94s/pipeline]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Keith\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)
  File "C:\Users\Keith\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)
  File "C:/Users/Keith/Dropbox/AutoDex/Extractor/tp4.py", line 35, in <module>
    pipeopt.fit(X_train, y_train)
  File "C:\Users\Keith\Anaconda2\lib\site-packages\tpot\tpot.py", line 307, in fit
    self._fitted_pipeline = self._toolbox.compile(expr=self._optimized_pipeline)
  File "C:\Users\Keith\Anaconda2\lib\site-packages\tpot\tpot.py", line 431, in _compile_to_sklearn
    sklearn_pipeline = generate_pipeline_code(expr_to_tree(expr))
  File "C:\Users\Keith\Anaconda2\lib\site-packages\tpot\export_utils.py", line 80, in expr_to_tree
    for node in ind:
TypeError: 'NoneType' object is not iterable
>>> 

Attaching screenshot and datafiles.

tpot err 4 windows

X2.zip
y2.zip

@danthedaniel
Copy link
Contributor

Your problem seems identical to #234. I'm guessing the shape of your labels is (N_rows, 1) instead of (N_rows, ).

@KeithBrodie
Copy link
Author

I read that and have tried explicitly reshaping to (n,). The problem does
not go away

On Aug 22, 2016 9:06 PM, "Daniel" notifications@github.com wrote:

Your problem seems identical to #234
#234. I'm guessing the shape of
your labels is (N_rows, 1) instead of (N_rows, ).


You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#233 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFz0r2WvzoMaSPQrl0ql6zkhNOuO_g7lks5qinHSgaJpZM4Jp_Vn
.

@KeithBrodie
Copy link
Author

Here's another version which explicitly reshapes the target array. The problem occurs with numpy array input with the label array of shape (n,). This example uses X2.csv and y2.csv posted with my earlier comment.

Keith

Code:


import pandas as pd
from tpot import TPOT

Xdf = pd.read_csv('X2.csv',index_col=None)
ydf = pd.read_csv('y2.csv',index_col=None)

Dates = Xdf.Date.values

Xdf.drop(u'Date',axis=1,inplace = True)

ydf.drop(u'Date',axis=1,inplace = True)

print (Xdf.columns)
print (ydf.columns)

X_train = Xdf[:-100].values
y_train = ydf.Target[:-100].values
X_test  = Xdf[-100:].values
y_test  = ydf.Target[-100:].values

y_train.ravel()
y_test.ravel()

print X_train.shape
print y_train.shape

pipeopt = TPOT(generations=5, 
               population_size=20, 
               num_cv_folds=5, 
               random_state=42, 
               verbosity=2)

pipeopt.fit(X_train, y_train)
print(pipeopt.score(X_test, y_test))
pipeopt.export('tpot_exported_pipeline.py')

Results (Note shape of X and y):


Index([u'DLR_P0', u'DLR_P1', u'DLR_P2', u'DLOC_P0', u'DLOC_P1', u'DLOC_P2',
       u'LVOL_P0', u'LVOL_P1', u'LVOL_P2', u'LMA5_P0', u'LMA5_P1', u'LMA5_P2',
       u'LSD5_P0', u'LSD5_P1', u'LSD5_P2', u'ARO10_P0', u'ARO10_P1',
       u'ARO10_P2'],
      dtype='object')
Index([u'Target'], dtype='object')
(1554, 18)
(1554,)

GP Progress:   0%|          | 0/120 [00:00<?, ?pipeline/s]
GP Progress:   3%|3         | 4/120 [00:00<00:03, 36.54pipeline/s]
GP Progress:   6%|5         | 7/120 [00:06<01:09,  1.63pipeline/s]
GP Progress:  12%|#1        | 14/120 [00:06<00:46,  2.29pipeline/s]
GP Progress:  13%|#3        | 16/120 [00:06<00:38,  2.70pipeline/s]
GP Progress:  15%|#5        | 18/120 [00:06<00:31,  3.27pipeline/s]
GP Progress:  17%|#6        | 20/120 [00:12<01:44,  1.05s/pipeline]

Traceback (most recent call last):
  File "/home/northwood/Dropbox/AutoDex/Extractor/tp6.py", line 40, in <module>
    pipeopt.fit(X_train, y_train)
  File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 307, in fit
    self._fitted_pipeline = self._toolbox.compile(expr=self._optimized_pipeline)
  File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 431, in _compile_to_sklearn
    sklearn_pipeline = generate_pipeline_code(expr_to_tree(expr))
  File "/usr/local/lib/python2.7/dist-packages/tpot/export_utils.py", line 80, in expr_to_tree
    for node in ind:
TypeError: 'NoneType' object is not iterable
>>> 

@rhiever
Copy link
Contributor

rhiever commented Aug 23, 2016

Thank you for the bug report, @KeithBrodie! There seems to be an issue with our "compile to sklearn Pipeline" functionality for Python 2.7. We need to dig into it soon and see what we can find out.

In the meantime, we thoroughly tested on Python 3.5 and TPOT should run without a hitch there.

@rhiever
Copy link
Contributor

rhiever commented Aug 23, 2016

Hi @KeithBrodie,

Thank you for sharing your data and a reproducible example so we could figure out what's going on. From looking at your data, it looks like your predicted target is continuous, which is a regression problem. At the moment, TPOT only supports classification problems.

We plan to add support for regression problems in the next release (0.6), hopefully within a couple weeks.

@KeithBrodie
Copy link
Author

Thanks, sorry about wasting your time, and thanks for TPOT, totally cool.

On Aug 23, 2016 10:55 AM, "Randy Olson" notifications@github.com wrote:

Hi @KeithBrodie https://github.com/KeithBrodie,

Thank you for sharing your data and a reproducible example so we could
figure out what's going on. From looking at your data, it looks like your
predicted target is continuous, which is a regression problem. At the
moment, TPOT only supports classification problems.

We plan to add support for regression problems in the next release (0.6),
hopefully within a couple weeks.


You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#233 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AFz0rx6AuI290mazoXwLYT3e169bmoITks5qizQGgaJpZM4Jp_Vn
.

@rhiever
Copy link
Contributor

rhiever commented Aug 23, 2016

Not a waste at all! You helped us realize that we could output a more useful failure message when users pass data in a format that scikit-learn can't handle.

@rhiever
Copy link
Contributor

rhiever commented Sep 2, 2016

Hi @KeithBrodie, we just released TPOT v0.6 today. Try upgrading TPOT via pip and using it on your regression data set. Usage docs: link

@KeithBrodie
Copy link
Author

Worked - very cool. Thanks

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants