Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can't read in 1GB data #2839

Closed
OneWind opened this issue Feb 11, 2013 · 11 comments
Closed

Can't read in 1GB data #2839

OneWind opened this issue Feb 11, 2013 · 11 comments
Labels
Bug IO Data IO issues that don't fit into a more specific label

Comments

@OneWind
Copy link

OneWind commented Feb 11, 2013

I am trying to repeat the code in:
http://wesmckinney.com/blog/?p=635
But I can't read in the data. I have more than 4GB memory available (with 8GB memory in total), but still has the following problem:

In [2]: data = pd.read_csv("P00000001-ALL.csv", index_col=False)
Python(9081,0xacb01a28) malloc: *** mmap(size=24051712) failed (error code=12)
*** error: can't allocate region
*** set a breakpoint in malloc_error_break to debug

@wesm
Copy link
Member

wesm commented Feb 11, 2013

can you give me a backtrace with gdb?

gdb --args python -c "import pandas as pd; data = pd.read_csv('P00000001-ALL.csv', index_col=False)"

then enter r and let it segfault then bt

@OneWind
Copy link
Author

OneWind commented Feb 11, 2013

Thanks for helping me. This is what I got:

This GDB was configured as "x86_64-apple-darwin"...Reading symbols for shared libraries .
warning: Could not find object file "/Users/builder/work/Python-2.7.3/libpython2.7.a(getbuildinfo.o)" - no debug information available for "./Modules/getbuildinfo.c".

warning: Could not find object file "/Users/builder/work/Python-2.7.3/libpython2.7.a(acceler.o)" - no debug information available for "Parser/acceler.c".

(I got a lot of warnings like these)
...

.. done

(gdb) r
Starting program: /Library/Frameworks/Python.framework/Versions/7.3/bin/python -c import\ pandas\ as\ pd;\ data\ =\ pd.read_csv(P00000001-ALL.csv,\ index_col=False)
Reading symbols for shared libraries ++......................... done

Program received signal SIGTRAP, Trace/breakpoint trap.
0x8fe01030 in __dyld__dyld_start ()
(gdb) bt
#0 0x8fe01030 in __dyld__dyld_start ()
#1 0x00001000 in ?? ()

@wesm
Copy link
Member

wesm commented Feb 11, 2013

Can you please try a (free) packaged python distro like EPDFree or Anaconda CE to see if it's a problem with your build of Python? You can pass engine='python' for now as a workaround to still be able to load the file.

@wesm
Copy link
Member

wesm commented Mar 12, 2013

Did you have a chance to look any further into this?

@ghost
Copy link

ghost commented Jul 29, 2013

carrier lost.

@ghost ghost closed this as completed Jul 29, 2013
@martinbel
Copy link

I have a similar issue with a 1.67 GB file. I've tried engine='python' but it eats all the RAM (16GB). I've left it 10 minutes and it didn't finish.

@Yevgnen
Copy link

Yevgnen commented Jul 17, 2017

Any update on this?

@jreback
Copy link
Contributor

jreback commented Jul 17, 2017

this is a long closed issue
if you have a reproducible issue with current pandas open a new issue

@Yevgnen
Copy link

Yevgnen commented Jul 17, 2017

I load a 1.4G json and parse it into list of dicts and create a large DataFrame, I can save to the csv by to_csv but I cannot load it by pd.read_csv. Exactly encountered the same issue as OP.

@jreback
Copy link
Contributor

jreback commented Jul 17, 2017

@Yevgnen

reproducible by others, with full code, and versions.

@seralouk
Copy link

seralouk commented Jul 3, 2019

Hello. I have an 1.2 GB csv (created using MatLab) and now I want to load it using pandas. If I do not specify engine='python' it just freezes. If I set it, it works. Why is that?

This issue was closed.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Bug IO Data IO issues that don't fit into a more specific label
Projects
None yet
Development

No branches or pull requests

6 participants