-
-
Notifications
You must be signed in to change notification settings - Fork 18.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
pandas error kills IPython kernel #9205
Comments
well this prob segfaults. It has weird line breaks and such. Doesn't even looks like a valid file to me. |
Well. read_html(url) works just fine on it? |
and i just tried pandas 0.14.1 and no such crash happens there. |
well, things seg fault if you feed them garbage input. Not sure that can be prevented in all cases. If you'd like to debug, feel free. |
what changed since 14.1 though? No seg fault there. |
blank line and comment parsing |
the question is, can the file be that garbage, when |
html and CSV are completely different |
strongly disagree. code shouldn't segfault and if it does than it's a bug which should get fixed. |
going from working to segfault and leaving it that way seems like a bad idea, @michaelaye in 0.14.1 were you getting a reasonable
|
@michaelaye as you suggested, |
|
Can one of you guys take a look? Its possible that these pr's changed this cc @mdmueller #7470 |
I'm very well aware that CSV table is not equal HTML table and I never asked for read_table to work on this file, as pointed out in my issue. All I'm worried about is the fact that a pandas segfault due to user error using a wrong function on the wrong data is extremely disruptive and did not happen as such in 0.14.1. Can a segfault even be caught with a try/except? (Can't try it out, am on holidays) |
segfaults can't be caught in python 2, but some signals that segfaults may generate can in python 3 |
The first commit that has the segfault is 31c2558. It looks like the problem is in the call to tokenize_nrows in _tokenize_rows File "/Library/Frameworks/Python.framework/Versions/3.4/lib/python3.4/site-packages/pandas-0.14.1_473_g31c2558-py3.4-macosx-10.6-intel.egg/pandas/io/parsers.py", line 1150, in read 538976288 is 20202020 hex, which makes me think that spaces are being misinterpreted somewhere, but perhaps only in the error message. The parser seems to think there are 524287 rows in the file, maybe because of the CR line endings. I'll dig some more this weekend. |
@selasley, nice. I wish it wasn't squashd down to a single commit, though. it would have been better to pin down what change broke.
wouldn't it be better to just fix the cython/c code that's segfaulting? |
oh absolutely! @michaelaye wondered if you can "catch" signals generated by segfaults and I just wanted to point out that you can in Python 3 but not in Python 2 (that I know of). I wasn't suggesting that we should catch them. |
Pull request #9360 fixes the buffer overflows that caused the interpreter to crash with this input file and a few others. |
closed by #9360 |
Doing this simple thing can kill an IPython 2 notebook kernel (version:GH master):
Note that I know this will fail, I just don't expect it to kill a notebook kernel?
Version:
pandas: 128ce85
IPython: 13facaf0206240a7301e045666143d68305d0119
The text was updated successfully, but these errors were encountered: