-
Notifications
You must be signed in to change notification settings - Fork 42
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Issue with marshal dump #144
Comments
This error comes from the implementation of Ruby. Ruby fails in marshaling a string with >= 2 GiB.
NArray depends on this limitation since it handles binary data as a Ruby string. Hmm... |
I was digging in the Marshal class code and in this line to line 321 it's described the problem. The constant SIZEOF_LONG is hardcoded checked with a limit of 4 when this constant takes 8 as value in my current ruby installation. i don't now why ruby authors do this check. |
Hi @seoanezonjic |
Hi @kojix2 |
Here is how python is doing this. They have a new pickle version that uses their buffer protocol (memoryview) https://peps.python.org/pep-0574/ tried to explain it in this rubybugs prop0osal. https://bugs.ruby-lang.org/issues/17685 They also have a new shared memory module that uses memory views. |
Hi authors of Numo-narray
I'm working with biological networks so I need to use matrix operations frecuently. In my project I was using the old Nmatrix library but recently I found your project. Your numpy style and your support to the community convince me to replace Nmatrix by Numo-array in my project. I observed a great improvement in memory, speed and code clarity using your library but i have problems serializing large matrix. With a matrix of 17078x17078 elements i have the following error:
in dump: long too big to dump (TypeError)
I will provide a context for the error. In my pipeline, the first step is take a network defined by pairs and transform it in a Numo::Narray object and then write to disk with:
File.binwrite(options[:output_matrix_file], Marshal.dump(matrix))
in this way I can do several different executions without waste time in the plain text to matrix transformation. I read my serialized matrix and perform the operation I'm interested in. The plain text to matrix transformation is done with the following code:
Source is an IO object with the plain text file and my strategy is build a hash with the pair relations to use it in the matrix filling process. I attach the data to create this matrix in the following link :
https://www.dropbox.com/s/vqmzkazag3m3fgz/pairs.tar.gz?dl=0
This was executed with numo-narray (0.9.1.5) and suse
openSUSE 12.3 (x86_64)
VERSION = 12
PATCHLEVEL = 3
CODENAME = Malachite
Thank you in advance
Pedro Seoane
The text was updated successfully, but these errors were encountered: