-
-
Notifications
You must be signed in to change notification settings - Fork 18.5k
Pandas read_csv out of memory even after adding chunksize #16537
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
pls show if the above is all you are doing, then it should work. exactly where does it run out of memory? |
I'm running it on a Jupyter notebook, and it crashes (Kernel dies) after processing 124 chunks of this data. |
There is no Error in the output, the notebook crashes before that |
Not to be pedantic, but are you sure your file is tab-separated? I've had an issue where I passed the wrong separator, and pandas tried to construction a single giant string which blew up memory. |
Yes , I verified that too, it's tab separated :) |
The same function worked for a 8 GB version of the file |
@gk13 you would have to show more code. It is certainly possible that the reading part is fine, but your chunk processing blows up memory. |
Ive updated the code above. It blows up after processing foo 124 times |
this keeps the reference around, you an |
I tried gc.collect() before returning from foo, didn't help. |
Any other suggestions? |
@gk13 : I'm in agreement with @TomAugspurger that your file could be malformed, as you have not been able to prove that you were able to read this otherwise (then again, what better way is there to do it than with Why don't you do this: Instead of reading the entire file into memory, pass in |
I've solved the memory error problem using chunks AND low_memory=False
|
I've solved the memory error problem using smaller chunks (size 1). It was like 3x slower, but it didn't error out. low_memory=False didn't work
|
what does axis=0 do? |
axis=0 - add/append new rows |
Seems like the debugging efforts in the original question stalled while others have had success with using |
I have the same problem with big csv-file (~10gb)
I found that it's not the file size that matters, but the row number on which the particular memory size overflows: different files produce an error on the same row numbers. This also excluding incorrect parsing case. |
i do have the same scenario and getting "Python Error: <>, exitCode: <139>" . does anybody have any resolution. kindly help me. |
Code Sample, a copy-pastable example if possible
Problem description
I have a 34 GB tsv file and I've been reading it using pandas readcsv function with chunksize specified as 1000000. The coomand above works fine with a 8 GB file, but pandas crashes for my 34 GB file, subsequently crashing my iPython notebook.
The text was updated successfully, but these errors were encountered: