Hanging forever when low memory #103

gygabyte017 · 2021-08-12T09:12:25Z

Hi, am experiencing kaleido randomly freezing on our production environment (unix with kubernetes).

I noticed that when the container has low memory, perhaps because the main python program consumed a lot of resources, for instance for holding the dataframe data needed to be plotted, when it calls write_image it will hang forever.

The kaleido process never terminates, there are no errors about a low memory condition, it just sits there with zero cpu consumption forever.

How can this be improved?

This behavior is very frustrating because sometimes I just find containers stuck running forever, that if I manually relaunch with the very same conditions they may run correctly, so I have no way to monitor if they got stuck,.

It would be ok that kaleido returns a memory error or a process failed exception, then it could be handled. But freezing forever... is just bad.

Any advice? Thank you

The text was updated successfully, but these errors were encountered:

jonmmease · 2021-08-12T15:59:48Z

Hi @gygabyte017, thanks for the report. I'm not sure if this is possible for you, but it would be helpful to see if any logging is collected (but not displayed) before it hangs.

Are you able to reproduce the issue from a python repl? If so, the instructions in this issue might yield some extra info that would be helpful (#36 (comment)).

If possible, what would be most helpful would be a reproducible example consisting of:

A docker file
A memory limit
A python script

Thanks!

gygabyte017 · 2021-08-13T14:51:08Z

Hi, unfortunately it is hard for me to give you what you asked, sorry about that :( because since it doesn't happen on my local pc while testing and it only happens on serverless containers spawned on EKS, andthe plotting happens after a lot of complex calculations involving other resources.
However here's what I found out, hoping that might be somehow useful:

The container has a memory limit of 2GB. If I increase it to 3GB, it never happens.
At the end of the calculations, write_image is called dozens of times to plot every needed plot, and the freeze never happens at the first image, always after a fews.
Using the trick proposed here Kaleido hangs on repeated write_image calls. #42 with scope._shutdown_kaleido() it almost never freezes anymore, even though I'm using v0.2.1 (it still randomly happens on 1-2% of executions, while before it happened half of the times, so that's a good result).

(Not sure about how could I access the frozen container and send an interrupt and interact with repl to provide more info).

Thanks

jonmmease · 2021-08-13T16:07:39Z

Thanks for this info @gygabyte017, that's helpful. Marking as a bug.

bhachauk · 2021-09-13T14:08:56Z

Same happening for the to_image call.
Is there something can be done to avoid this ?

jonmmease · 2021-09-13T21:15:25Z

@Bhanuchander210, are you seeing this behavior being related to low memory as well?

bhachauk · 2021-09-15T11:00:22Z

@jonmmease
I am not sure but i hope so.
It is happening only on production environment randomly (as production env has other servers too.. so that i didn't track)
After the change made scope._shutdown_kaleido() (as u suggested)... till now it not hanged...

jonmmease · 2021-09-15T13:20:02Z

Ok, thanks @Bhanuchander210.

jonmmease · 2021-09-15T13:25:05Z

Notes:
Cross reference #43, which added some internal tracking of JavaScript heap usage, periodically clearing memory by refreshing the active page. If manually running scope._shutdown_kaleido() works around the issue, then I assume this internal page refresh to clear memory would do the same.

We're already refreshing the page when the heap reaches 50% of the maximum allowed. But I don't know whether this maximum limit (as returned by window.performance.memory.jsHeapSizeLimit) takes into account the available system memory. If not, then this might explain the trouble we're running into in memory constrained environments. Two ideas (not mutually exclusive):

See if the chromium API provides a way to access the current system memory available, and incorporate that when deciding whether to refresh the page.
Make this memory limit configurable through the API

LukasRauth · 2021-11-03T09:53:12Z

Hi,
I get
I get the same issue on my local pc on the first plotly figure i want to statically export (to a PDF), when I limit the python process's virtual memory "RLIMIT_AS" manually to for example 6GB.

Is there any progress on that topic?

Here is the code that I use to limit the virtual memory. (Thats something we need to do for that specific program to make sure that i wont get in conflict with the productive processes...)

import resource

def limit_memory():
    max_memory_mb = 6000
    soft_limit = max_memory_mb * 1024 * 1024
    resource.setrlimit(resource.RLIMIT_AS, (soft_limit, resource.RLIM_INFINITY))

Since it is the first export anyways, I cannot use the proposed workaround with scope._shutdown_kaleido()...

Im running on

kaleido==0.2.1
plotly==5.3.1

MaartenBW · 2024-05-17T13:05:38Z

Hi @gygabyte017

Did you ever resolve this issue? Did downgrading to v0.1.0 work to solve this issue?

Thanks.

gygabyte017 · 2024-05-17T13:13:18Z

Hi @MaartenBW, unfortunately I didn't, any version seems random, I don't believe there are reasons to prefeer 0.1.0 over 0.2.1 or whatever, it's just luck depending on the machine resources.

I managed to develop a ugly workaround, that is: 1) Increase the maximum ram on the containers, even though it wouldn't be necessary, and 2) the write_image is executed in a separate process with a timeout, if after i.e. 30 seconds it is still working, I kill the separate process and try again up to 5 tries.

In this way it's very rare that all the 5 tries fails, however it may still happen.

Now I want to try the solution described here, maybe it can work in a stable way? #110 (comment)

MaartenBW · 2024-05-17T13:24:58Z

@gygabyte017 Wow, thanks for your fast reply.

jonmmease added the bug something broken label Aug 13, 2021

AbdealiLoKo mentioned this issue Nov 11, 2022

Is this project dead? #150

Closed

gvwilson self-assigned this Jul 26, 2024

gvwilson removed their assignment Aug 3, 2024

gvwilson added the P3 not needed for current cycle label Aug 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Hanging forever when low memory #103

Hanging forever when low memory #103

gygabyte017 commented Aug 12, 2021

jonmmease commented Aug 12, 2021

gygabyte017 commented Aug 13, 2021

jonmmease commented Aug 13, 2021

bhachauk commented Sep 13, 2021 •

edited

Loading

jonmmease commented Sep 13, 2021

bhachauk commented Sep 15, 2021

jonmmease commented Sep 15, 2021

jonmmease commented Sep 15, 2021

LukasRauth commented Nov 3, 2021 •

edited

Loading

MaartenBW commented May 17, 2024

gygabyte017 commented May 17, 2024

MaartenBW commented May 17, 2024

Hanging forever when low memory #103

Hanging forever when low memory #103

Comments

gygabyte017 commented Aug 12, 2021

jonmmease commented Aug 12, 2021

gygabyte017 commented Aug 13, 2021

jonmmease commented Aug 13, 2021

bhachauk commented Sep 13, 2021 • edited Loading

jonmmease commented Sep 13, 2021

bhachauk commented Sep 15, 2021

jonmmease commented Sep 15, 2021

jonmmease commented Sep 15, 2021

LukasRauth commented Nov 3, 2021 • edited Loading

MaartenBW commented May 17, 2024

gygabyte017 commented May 17, 2024

MaartenBW commented May 17, 2024

bhachauk commented Sep 13, 2021 •

edited

Loading

LukasRauth commented Nov 3, 2021 •

edited

Loading