Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

TypeError: 'NoneType' object is not iterable #234

Closed
earino opened this issue Aug 22, 2016 · 9 comments
Closed

TypeError: 'NoneType' object is not iterable #234

earino opened this issue Aug 22, 2016 · 9 comments

Comments

@earino
Copy link

earino commented Aug 22, 2016

In attempting to use the newly released tpot 0.5.0 I get an error when attempting to fit a pipeline.

Context of the issue

This was a working example before the release of tpot 0.5.0, and after the upgrade, I am getting errors. The reproducible example is available here:

https://app.dominodatalab.com/u/earino/tpot_reprex/runs/57bb7b8180f4fe61775a16c2

Process to reproduce the issue

The script which crashes is here:

https://app.dominodatalab.com/u/earino/tpot_reprex/view/example.py

The code can be executed by simply clicking "run" on the example.py screen above ^

Expected result

I would have expected a generated pipeline

Current result

I am getting the error:

TypeError: 'NoneType' object is not iterable

Possible fix

Unknown

Environment Information

You can look at the log of the run in the above link, however to make life easier, I am copy/pasting the build process which contains package versions:

Collecting deap (from -r /mnt/requirements.txt (line 4))
  Downloading deap-1.0.2.post2.tar.gz (852kB)
Collecting update_checker (from -r /mnt/requirements.txt (line 5))
  Downloading update_checker-0.12-py2.py3-none-any.whl
Collecting tqdm (from -r /mnt/requirements.txt (line 6))
  Downloading tqdm-4.8.4-py2.py3-none-any.whl
Collecting tpot (from -r /mnt/requirements.txt (line 7))
  Downloading TPOT-0.5.0.tar.gz (1.1MB)
Requirement already satisfied (use --upgrade to upgrade): requests>=2.3.0 in /usr/local/lib/python2.7/dist-packages (from update_checker->-r /mnt/requirements.txt (line 5))
Installing collected packages: deap, update-checker, tqdm, tpot
  Running setup.py install for deap: started
    Running setup.py install for deap: finished with status 'done'
  Running setup.py install for tpot: started
    Running setup.py install for tpot: finished with status 'done'
Successfully installed deap-1.0.2 tpot-0.5.0 tqdm-4.8.4 update-checker-0.12
@danthedaniel
Copy link
Contributor

danthedaniel commented Aug 22, 2016

TPOT no longer uses Pandas dataframes, so that may be part of the issue. However, converting each of X_train, y_train, etc. to numpy matrices with the function .to_matrix() still yields the error.

Comparing against known working datasets, the difference appears to be that the shape of your y is (N_rows, 1), rather than the expected (N_rows, ). So for the labels you're passing a list of lists of length 1, rather than just a list.

@danthedaniel
Copy link
Contributor

Granted, the error message should be more explanatory. I'll definitely fix that for the next release.

@earino
Copy link
Author

earino commented Aug 22, 2016

@teaearlgraycold oh how interesting, is there a link to some discussion on why Pandas dataframes were dropped out out TPOT?

@danthedaniel
Copy link
Contributor

danthedaniel commented Aug 22, 2016

#113

Essentially, pandas has a much larger memory footprint (Edit than numpy matrices).

Also, and this was a very recent decision that didn't get much discussion on GitHub, TPOT now does almost no data management, with all data just passed to sklearn pipelines directly. This works by actually exporting each tested pipeline to Python code (using the same code as the .export() function) and running eval on that.

@KeithBrodie
Copy link

KeithBrodie commented Aug 23, 2016

The error still occurs when the X and Y datasets presented to the fit method are numpy arrays and the shape of the target array is (n,) i.e. one dimensional. I have code and data to replicate it here:
#233 , fourth comment.

@rhiever
Copy link
Contributor

rhiever commented Aug 23, 2016

@earino, I think your issue could be fixed if you replaced the lines:

X_train = train.ix[:, df.columns != "class"].as_matrix()
y_train = train["class"].as_matrix()
X_test = test.ix[:, df.columns != "class"].as_matrix()
y_test = test["class"].as_matrix()

with

X_train = train.ix[:, df.columns != "class"].values
y_train = train["class"].values
X_test = test.ix[:, df.columns != "class"].values
y_test = test["class"].values

Please let me know how that works for you.

@rhiever rhiever removed the bug label Aug 23, 2016
@earino
Copy link
Author

earino commented Aug 23, 2016

@rhiever as_matrix seemed to work, that's what got me to error #235. I will still use .values as per your suggestion to verify it's the same error.

@rhiever
Copy link
Contributor

rhiever commented Aug 24, 2016

Does this issue seem to be solved now, per your comments in #235?

@earino
Copy link
Author

earino commented Aug 24, 2016

The issue is solved, in that if I send data to it in the proper format, it behaves correctly :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants