Error Generating Pipeline Code during Fit #233

KeithBrodie · 2016-08-22T15:24:39Z

While fitting an X, y pair presented as pandas dataframes TPOT crashed in generate pipeline code.

Context of the issue

Ubuntu 16.04 LTS
Python 2.7.12
deap 1.02
TPOT 0.5.0

Code is a copy of the example code from the documentation replacing the dataset with one generated locally.

#import pandas as pd
from BuildXY import BuildXY
from tpot import TPOT

Xdf,ydf = BuildXY('aapl')

Dates = Xdf.Date.values

Xdf.drop(u'Date',axis=1,inplace = True)

ydf.drop(u'Date',axis=1,inplace = True)

X_train = Xdf[:-100]
y_train = ydf[:-100]
X_test  = Xdf[-100:]
y_test  = ydf[-100:]

pipeopt = TPOT(generations=5, 
               population_size=20, 
               num_cv_folds=5, 
               random_state=42, 
               verbosity=2)

pipeopt.fit(X_train, y_train)
print(pipeopt.score(X_test, y_test))
pipeopt.export('tpot_exported_pipeline.py')

Process to reproduce the issue

User creates TPOT instance
User calls TPOT fit() function with training data
TPOT crashes with a TypeError

Expected result

Fit method to complete without crashing

Current result

GP Progress: 0%| | 0/120 [00:00<?, ?pipeline/s]
GP Progress: 10%|# | 12/120 [00:00<00:00, 118.37pipeline/s]
GP Progress: 13%|#3 | 16/120 [1:19:04<10:16:46, 355.83s/pipeline]
GP Progress: 14%|#4 | 17/120 [1:19:22<7:16:51, 254.48s/pipeline]
GP Progress: 15%|#5 | 18/120 [1:19:23<5:03:10, 178.34s/pipeline]
GP Progress: 16%|#5 | 19/120 [1:22:02<4:50:39, 172.67s/pipeline]
GP Progress: 17%|#6 | 20/120 [1:22:02<3:21:31, 120.91s/pipeline]

Traceback (most recent call last):
File "/home/northwood/Dropbox/AutoDex/Extractor/tp1.py", line 31, in
pipeopt.fit(X_train, y_train)
File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 307, in fit
self._fitted_pipeline = self._toolbox.compile(expr=self._optimized_pipeline)
File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 431, in _compile_to_sklearn
sklearn_pipeline = generate_pipeline_code(expr_to_tree(expr))
File "/usr/local/lib/python2.7/dist-packages/tpot/export_utils.py", line 80, in expr_to_tree
for node in ind:
TypeError: 'NoneType' object is not iterable

Possible fix

I don't know.

Screenshot

The text was updated successfully, but these errors were encountered:

danthedaniel · 2016-08-22T16:11:17Z

I'm trying to replicate your issue, but I'm not sure where you're getting the BuildXY package from. Is that something that's bundled with the Spyder IDE?

KeithBrodie · 2016-08-22T16:28:00Z

No, something I wrote. I will post the output dataframes. I'm re-running
it passing X and y as numpy arrays. Behavior is different. Don't know yet
if it runs to completion.

Thanks for looking at it

Keith

On Aug 22, 2016 9:11 AM, "Daniel" notifications@github.com wrote:

I'm trying to replicate your issue, but I'm not sure where you're getting
the BuildXY package from. Is that something that's bundled with the
Spyder IDE?

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#233 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFz0r9Z4LOZN_CGfn0qm2S7F5ffg7j-Wks5qicoqgaJpZM4Jp_Vn
.

KeithBrodie · 2016-08-23T04:04:48Z

Ok, replicated the error with a smaller dataset, this time on Windows. Including a reduced data-set that also generates the error.

Code:

import pandas as pd
#from BuildXY import BuildXY
from tpot import TPOT

Xdf = pd.read_csv('X2.csv',index_col=None)
ydf = pd.read_csv('y2.csv',index_col=None)

Dates = Xdf.Date.values

Xdf.drop(u'Date',axis=1,inplace = True)

ydf.drop(u'Date',axis=1,inplace = True)

print (Xdf.columns)
print (ydf.columns)

X_train = Xdf[:-100].values
y_train = ydf[:-100].values
X_test  = Xdf[-100:].values
y_test  = ydf[-100:].values

pipeopt = TPOT(generations=5, 
               population_size=20, 
               num_cv_folds=5, 
               random_state=42, 
               verbosity=2)

pipeopt.fit(X_train, y_train)
print(pipeopt.score(X_test, y_test))
pipeopt.export('tpot_exported_pipeline.py')

Result:

Python 2.7.11 |Anaconda 4.0.0 (64-bit)| (default, Feb 16 2016, 09:58:36) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org

runfile('C:/Users/Keith/Dropbox/AutoDex/Extractor/tp4.py', wdir='C:/Users/Keith/Dropbox/AutoDex/Extractor')
Index([u'DLR_P0', u'DLR_P1', u'DLR_P2', u'DLOC_P0', u'DLOC_P1', u'DLOC_P2',
u'LVOL_P0', u'LVOL_P1', u'LVOL_P2', u'LMA5_P0', u'LMA5_P1', u'LMA5_P2',
u'LSD5_P0', u'LSD5_P1', u'LSD5_P2', u'ARO10_P0', u'ARO10_P1',
u'ARO10_P2'],
dtype='object')
Index([u'Target'], dtype='object')

GP Progress: 0%| | 0/120 [00:00<?, ?pipeline/s]
GP Progress: 4%|? | 5/120 [00:01<00:34, 3.36pipeline/s]
GP Progress: 5%|¦ | 6/120 [03:01<1:42:50, 54.13s/pipeline]
GP Progress: 11%|¦ | 13/120 [03:01<1:07:34, 37.89s/pipeline]
GP Progress: 12%|¦? | 15/120 [06:09<1:35:46, 54.73s/pipeline]
GP Progress: 13%|¦? | 16/120 [06:09<1:06:28, 38.35s/pipeline]
GP Progress: 16%|¦¦ | 19/120 [09:16<1:16:42, 45.57s/pipeline]
GP Progress: 17%|¦? | 20/120 [09:16<53:14, 31.94s/pipeline]

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "C:\Users\Keith\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 699, in runfile
    execfile(filename, namespace)
  File "C:\Users\Keith\Anaconda2\lib\site-packages\spyderlib\widgets\externalshell\sitecustomize.py", line 74, in execfile
    exec(compile(scripttext, filename, 'exec'), glob, loc)
  File "C:/Users/Keith/Dropbox/AutoDex/Extractor/tp4.py", line 35, in <module>
    pipeopt.fit(X_train, y_train)
  File "C:\Users\Keith\Anaconda2\lib\site-packages\tpot\tpot.py", line 307, in fit
    self._fitted_pipeline = self._toolbox.compile(expr=self._optimized_pipeline)
  File "C:\Users\Keith\Anaconda2\lib\site-packages\tpot\tpot.py", line 431, in _compile_to_sklearn
    sklearn_pipeline = generate_pipeline_code(expr_to_tree(expr))
  File "C:\Users\Keith\Anaconda2\lib\site-packages\tpot\export_utils.py", line 80, in expr_to_tree
    for node in ind:
TypeError: 'NoneType' object is not iterable
>>>

Attaching screenshot and datafiles.

X2.zip
y2.zip

danthedaniel · 2016-08-23T04:06:37Z

Your problem seems identical to #234. I'm guessing the shape of your labels is (N_rows, 1) instead of (N_rows, ).

KeithBrodie · 2016-08-23T12:13:04Z

I read that and have tried explicitly reshaping to (n,). The problem does
not go away

On Aug 22, 2016 9:06 PM, "Daniel" notifications@github.com wrote:

Your problem seems identical to #234
#234. I'm guessing the shape of
your labels is (N_rows, 1) instead of (N_rows, ).

—
You are receiving this because you authored the thread.
Reply to this email directly, view it on GitHub
#233 (comment), or mute
the thread
https://github.com/notifications/unsubscribe-auth/AFz0r2WvzoMaSPQrl0ql6zkhNOuO_g7lks5qinHSgaJpZM4Jp_Vn
.

KeithBrodie · 2016-08-23T12:37:08Z

Here's another version which explicitly reshapes the target array. The problem occurs with numpy array input with the label array of shape (n,). This example uses X2.csv and y2.csv posted with my earlier comment.

Keith

Code:


import pandas as pd
from tpot import TPOT

Xdf = pd.read_csv('X2.csv',index_col=None)
ydf = pd.read_csv('y2.csv',index_col=None)

Dates = Xdf.Date.values

Xdf.drop(u'Date',axis=1,inplace = True)

ydf.drop(u'Date',axis=1,inplace = True)

print (Xdf.columns)
print (ydf.columns)

X_train = Xdf[:-100].values
y_train = ydf.Target[:-100].values
X_test  = Xdf[-100:].values
y_test  = ydf.Target[-100:].values

y_train.ravel()
y_test.ravel()

print X_train.shape
print y_train.shape

pipeopt = TPOT(generations=5, 
               population_size=20, 
               num_cv_folds=5, 
               random_state=42, 
               verbosity=2)

pipeopt.fit(X_train, y_train)
print(pipeopt.score(X_test, y_test))
pipeopt.export('tpot_exported_pipeline.py')

Results (Note shape of X and y):


Index([u'DLR_P0', u'DLR_P1', u'DLR_P2', u'DLOC_P0', u'DLOC_P1', u'DLOC_P2',
       u'LVOL_P0', u'LVOL_P1', u'LVOL_P2', u'LMA5_P0', u'LMA5_P1', u'LMA5_P2',
       u'LSD5_P0', u'LSD5_P1', u'LSD5_P2', u'ARO10_P0', u'ARO10_P1',
       u'ARO10_P2'],
      dtype='object')
Index([u'Target'], dtype='object')
(1554, 18)
(1554,)

GP Progress:   0%|          | 0/120 [00:00<?, ?pipeline/s]
GP Progress:   3%|3         | 4/120 [00:00<00:03, 36.54pipeline/s]
GP Progress:   6%|5         | 7/120 [00:06<01:09,  1.63pipeline/s]
GP Progress:  12%|#1        | 14/120 [00:06<00:46,  2.29pipeline/s]
GP Progress:  13%|#3        | 16/120 [00:06<00:38,  2.70pipeline/s]
GP Progress:  15%|#5        | 18/120 [00:06<00:31,  3.27pipeline/s]
GP Progress:  17%|#6        | 20/120 [00:12<01:44,  1.05s/pipeline]

Traceback (most recent call last):
  File "/home/northwood/Dropbox/AutoDex/Extractor/tp6.py", line 40, in <module>
    pipeopt.fit(X_train, y_train)
  File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 307, in fit
    self._fitted_pipeline = self._toolbox.compile(expr=self._optimized_pipeline)
  File "/usr/local/lib/python2.7/dist-packages/tpot/tpot.py", line 431, in _compile_to_sklearn
    sklearn_pipeline = generate_pipeline_code(expr_to_tree(expr))
  File "/usr/local/lib/python2.7/dist-packages/tpot/export_utils.py", line 80, in expr_to_tree
    for node in ind:
TypeError: 'NoneType' object is not iterable
>>>

rhiever · 2016-08-23T17:35:06Z

Thank you for the bug report, @KeithBrodie! There seems to be an issue with our "compile to sklearn Pipeline" functionality for Python 2.7. We need to dig into it soon and see what we can find out.

In the meantime, we thoroughly tested on Python 3.5 and TPOT should run without a hitch there.

rhiever · 2016-08-23T17:55:13Z

Hi @KeithBrodie,

Thank you for sharing your data and a reproducible example so we could figure out what's going on. From looking at your data, it looks like your predicted target is continuous, which is a regression problem. At the moment, TPOT only supports classification problems.

We plan to add support for regression problems in the next release (0.6), hopefully within a couple weeks.

KeithBrodie · 2016-08-23T18:08:08Z

Thanks, sorry about wasting your time, and thanks for TPOT, totally cool.

On Aug 23, 2016 10:55 AM, "Randy Olson" notifications@github.com wrote:

Hi @KeithBrodie https://github.com/KeithBrodie,

Thank you for sharing your data and a reproducible example so we could
figure out what's going on. From looking at your data, it looks like your
predicted target is continuous, which is a regression problem. At the
moment, TPOT only supports classification problems.

We plan to add support for regression problems in the next release (0.6),
hopefully within a couple weeks.

—
You are receiving this because you were mentioned.
Reply to this email directly, view it on GitHub
#233 (comment), or
mute the thread
https://github.com/notifications/unsubscribe-auth/AFz0rx6AuI290mazoXwLYT3e169bmoITks5qizQGgaJpZM4Jp_Vn
.

rhiever · 2016-08-23T18:25:29Z

Not a waste at all! You helped us realize that we could output a more useful failure message when users pass data in a format that scikit-learn can't handle.

rhiever · 2016-09-02T20:29:12Z

Hi @KeithBrodie, we just released TPOT v0.6 today. Try upgrading TPOT via pip and using it on your regression data set. Usage docs: link

KeithBrodie · 2016-09-04T04:50:14Z

Worked - very cool. Thanks

KeithBrodie mentioned this issue Aug 23, 2016

TypeError: 'NoneType' object is not iterable #234

Closed

rhiever added the bug label Aug 23, 2016

rhiever mentioned this issue Aug 23, 2016

TPOT should fail gracefully when all pipelines fail to evaluate #236

Closed

rhiever removed the bug label Aug 23, 2016

rhiever closed this as completed Aug 23, 2016

weixuanfu mentioned this issue Aug 26, 2016

For issue #236 #243

Merged

AIAdventures mentioned this issue Jun 6, 2017

Titanic example -problem with 2nd last cell. #492

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Error Generating Pipeline Code during Fit #233

Error Generating Pipeline Code during Fit #233

KeithBrodie commented Aug 22, 2016

danthedaniel commented Aug 22, 2016

KeithBrodie commented Aug 22, 2016

KeithBrodie commented Aug 23, 2016

danthedaniel commented Aug 23, 2016

KeithBrodie commented Aug 23, 2016

KeithBrodie commented Aug 23, 2016

rhiever commented Aug 23, 2016

rhiever commented Aug 23, 2016

KeithBrodie commented Aug 23, 2016

rhiever commented Aug 23, 2016

rhiever commented Sep 2, 2016

KeithBrodie commented Sep 4, 2016

Error Generating Pipeline Code during Fit #233

Error Generating Pipeline Code during Fit #233

Comments

KeithBrodie commented Aug 22, 2016

Context of the issue

Process to reproduce the issue

Expected result

Current result

Possible fix

Screenshot

danthedaniel commented Aug 22, 2016

KeithBrodie commented Aug 22, 2016

KeithBrodie commented Aug 23, 2016

danthedaniel commented Aug 23, 2016

KeithBrodie commented Aug 23, 2016

KeithBrodie commented Aug 23, 2016

rhiever commented Aug 23, 2016

rhiever commented Aug 23, 2016

KeithBrodie commented Aug 23, 2016

rhiever commented Aug 23, 2016

rhiever commented Sep 2, 2016

KeithBrodie commented Sep 4, 2016