-
-
Notifications
You must be signed in to change notification settings - Fork 18.2k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ValueError: EXT data is too large #12905
Comments
@randomgambit if you can post something more. cc @kawochen |
hi jeff i dont have my computer in front of me. but its the same exact dataframe as my post on the slowness of to_csv |
tell me what information you need |
ahh ok can u reference that issue here as well then (and post the df.info() it may break if u have an object column that actually has object types in it (and not strings) |
other problems related to this df are discussed here #12885 hope that helps |
does this work with a smaller slice of your frame? Do you have any non-string object data? IOW, run something like
If you get anything like storing giant opaque frames like this is generally not very useful as it forces you to load them entirely into memory to work with them. |
Hi Jeff aka @jreback Thanks! A couple of points
|
@jreback is it possible to write to |
it's possible but not efficient pls see if u can narrow down exactly when his occurs (eg remove columns until it works) |
this line . It's not clear to me why it's testing for |
I wonder if msgpack uses a single byte for size which is limited |
In the C code |
so the bigger issue here is whether we should actually allow bigger than 2*32-1 (4GB) total bytes in a single write. The short answer is no. The long answer is to do multiple writes. Most chunked systems don't even let you do this kind of size in a single write, though they let you *do it, they will chunk write it (and reassemble before handing it back to you). All that said,
|
cc @llllllllll do you do anything like this? |
so my understanding is that |
no it won't work, maybe we could change it. but there are good reasons not to. you are much better off chunking when storing. large opaque stores are not good for lots of reasons. as I said maybe we could support a chunking layer on top, but I am a bit reluctant to creating yet another thing that is somewhat endemic to pandas (and not a 'standard') |
you say large opaque store but I dont have mixed types anymore! I cleaned everything and |
@randomgambit opaque as in a binary blob. It is not indexable. You can retreive the entire blob or not. as your data gets bigger this becomes a more desirable property. |
@jreback We haven't needed to send anything larger than an int32 could hold so we are not chunking it up. I plan to do chunking on the blaze server for other reasons though. |
msgpack is deprecated #30112 |
Hi guys,
I am happy to help you improve
msgpack.
I tried to export my massive dataframe this morning using
msgpack
, and I got this error.ValueError: EXT data is too large
What does that mean? Is there a size limit?
The text was updated successfully, but these errors were encountered: