running with k>2 raise "memory allocation failed" error #30

NealT87 · 2019-02-28T10:35:30Z

Any value above k>2 for transfer_entropy method creates the following issue(for k=1|2 it works):

test = pyinform.transfer_entropy(x,y,k=3)

Traceback (most recent call last):
File "C:/Users/user/PycharmProjects/JerusalemProject/JerusalemProject/ActionActorAnalysis.py", line 279, in
temp = pyinform.transfer_entropy(x,y,k=3)
File "C:\Users\user\Anaconda2\envs\Python35\lib\site-packages\pyinform\transferentropy.py", line 179, in transfer_entropy
error_guard(e)
File "C:\Users\user\Anaconda2\envs\Python35\lib\site-packages\pyinform\error.py", line 57, in error_guard
raise InformError(e,func)
pyinform.error.InformError: an inform error occurred - "memory allocation failed"

dglmoore · 2019-03-21T20:37:31Z

Hi @NealT87. Thanks for the new issue! This error, admittedly vague, usually means that the C library couldn't allocate enough memory. The amount of memory necessary depends on

the base of the time series provided
the history length k

Could you share the range of values in the x and y time series?

NealT87 · 2019-03-26T07:37:25Z

Hi Douglas, Sorry for only responding now. It has been a hectic week. Thank you for the response. The range of X was range(0,6) integers and Y was range(0,37) integers. I used a base of 2.

…

_________________________________________________________ *Neal Tsur* PhD. Candidate | *Civil Unrest, Prediction, Sociophysics, Complex Systems*Algorithm Developer | *M* *achine Learning, Neural Networks, NLP*Children's Physics Book Writer | *"Shai-Li Asks Why?"* Tel: +972-50-644-9129 | Linkedin <https://www.linkedin.com/in/neal-tsur/>

_________________________________________________________

On Thu, Mar 21, 2019 at 10:37 PM Douglas G. Moore ***@***.***> wrote: Hi @NealT87 <https://github.com/NealT87>. Thanks for the new issue! This error, admittedly vague, usually means that the C library couldn't allocate enough memory. The amount of memory necessary depends 1. the base of the time series provided 2. the history length k Could you share the range of values in the x and y time series? — You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub <#30 (comment)>, or mute the thread <https://github.com/notifications/unsubscribe-auth/Ac2nJzuamp7Uzth9zW4NdifhRC-rC7plks5vY-2LgaJpZM4bWgDC> .

silviaruiz44 · 2020-07-10T13:05:30Z

Were you able to solve the problem? I am running into the same problem.

dglmoore · 2020-07-10T17:21:04Z

@silviaruiz44 Thanks for reviving this issue. I suspect the problem is the range of values in your time series. If that's the case, then there are some workarounds.

If you wouldn't mind providing a sample of the source and target time series, that would be helpful for confirming the issues.

silviaruiz44 · 2020-07-15T21:41:32Z

Does the data have to be normalized or in close ranges? Why so?

I also have a question regarding the mutual information function. Does it depend on the scaling? I calculated the mutual information of a time series against itself and got value. When dividing the whole time series by a scalar and calculating the mutual information of the series again, I get a different value. Which is strange, because it is the same time series, (just scaled). I am wondering what is the interpretation or explanation to that?

Thanks in advance for your time.

dglmoore · 2020-07-16T19:22:24Z

@silviaruiz44
To the point of why the "memory allocation failed" error is happening. We use the data that you provide to construct histograms. Each bin of the histogram represents a different value that could possibly be observed in your data, and the histogram is stored in a dense form. Say we're dealing with transfer entropy from X to Y with a history length of k, and that X and Y can take integer (more on that below) values between 0 and 99. Then we'd need an array that can store 100 future states of Y, 100 past states of X, 100^4 values representing the past and future states of Y, and 100^5 values of the combined past of X, past of Y and future of Y, for a grand total of 1.01e10 integers representing the number of times each combination is actually observed. That will require something like 40GB of RAM, hence the allocation failure. In principle, this information could be stored more efficiently using a sparse memory representation, e.g. only store what you actually observe. However, there are performance trade-offs and questions of statistical significance when you get into situations like the one above. Sometimes there are workarounds, so let me know if you are dead-set on being able to apply these methods to data like this.

Now to a bigger issue. PyInform doesn't really support continuously-valued data. The data that you pass into the time series measures, e.g. transferentropy, has to be integer-valued. We're essentially estimating the probabilities of an event using frequencies taken from the time series, and that doesn't make much sense with continuously-valued data. There are methods for handling continuous data, but they aren't currently implmented in (Py)Inform. The documentation mentions this, but not emphatically enough (you're not the first person to run into this issue.)

I'd wager that the reason the mutual information changes when you scale the values has to do with how C casts values. We are using numpy internally to convert the data you provide into numpy arrays with integer values, and numpy doesn't complain when you do something like numpy.asarray([3.0, 4.0, 5.0, 6.0], dtype=np.int32). It just happily passes the input along to C who then casts the values to integers, so you end up with [3, 4, 5, 6]. However, if you first divide the values by 2 before giving them to pyinform, the resulting array will be [1, 2, 2, 3]. You go from having 4 distinct values to only having 3.

Ideally, the time series functions would raise an exception if you provide continuously-valued data; however, we haven't decided exactly how we want to handle that since it requires an additional pass over the data to check the types.

All of that said, you have a couple of options for dealing with continuous data.

Binning

Pyinform provides some (primitive) methods for binning continuously-valued data. You can choose to bin using a fixed number of bins, a fixed bin size, or specify the boundaries between bins. There are lots of different ways of choosing, for example, the width of the bins, e.g. the Freedman-Diaconis rule or Sturges's rule. If you are dealing with data that can be easily thought of as binary, e.g. a neuron is spiking or it isn't, then you can pick a threshold and call any value above it 1 and anything below it 0.

Most of the data that I deal with personally can be reasonably binned, but that's not always the case and doing so can introduce artifacts and bias. An alternative is to use the continuous data directly.

JIDT
A really good method for estimating mutual information (and transfer entropy, which is just a special case of conditional mutual information), is the Kraskov-Stögbauer-Grassberger estimator (KSG). Unfortunatly, (Py)Inform doesn't implement it at the moment because I just haven't had enough time or the energy to implement a KD-tree in C 😄. If this is something that you desperately need, we can see about bumping this issue up in the priorities list.

In the meantime, I'd recommend considering JIDT if binning your data just won't work for what you want to do. It has just about all of the features of (Py)Inform and then some, including implementations of the KSG estimator (which JIDT calls Kraskov). It's written in Java, but it has tutorials of how to use it from Python.

silviaruiz44 · 2020-07-22T14:13:19Z

Thank you so much for your answer! It helps me a lot. I have a last question. How can we test the significance or accuracy of the mutual information estimates?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

running with k>2 raise "memory allocation failed" error #30

running with k>2 raise "memory allocation failed" error #30

NealT87 commented Feb 28, 2019

dglmoore commented Mar 21, 2019 •

edited

Loading

NealT87 commented Mar 26, 2019 via email

silviaruiz44 commented Jul 10, 2020

dglmoore commented Jul 10, 2020

silviaruiz44 commented Jul 15, 2020

dglmoore commented Jul 16, 2020

silviaruiz44 commented Jul 22, 2020

running with k>2 raise "memory allocation failed" error #30

running with k>2 raise "memory allocation failed" error #30

Comments

NealT87 commented Feb 28, 2019

dglmoore commented Mar 21, 2019 • edited Loading

NealT87 commented Mar 26, 2019 via email

silviaruiz44 commented Jul 10, 2020

dglmoore commented Jul 10, 2020

silviaruiz44 commented Jul 15, 2020

dglmoore commented Jul 16, 2020

silviaruiz44 commented Jul 22, 2020

dglmoore commented Mar 21, 2019 •

edited

Loading