Kaleido Zombie Processes and standard error handling #49

rgrzeszi · 2020-09-29T10:04:41Z

Hi guys,

we are running plotly and kaleido and we are generating a large number of plots (usually rendered as svg or png) on potentially large images. I observed quite a large number of processes which are not being stopped properly (see below) up until the point that no more processes can be forked and the whole program crashes.

rgrzeszi 12083  0.0  0.0  12888  3168 pts/0    S+   11:47   0:00 /bin/bash /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/kaleido plotly --disable-gpu
rgrzeszi 12088  0.9  0.0 340052 58716 pts/0    Sl+  11:47   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --no-sandbox --allow-file-access-from-files --disable-breakpad --disable-gpu plotly
rgrzeszi 12090  0.0  0.0 167868 26992 pts/0    S+   11:47   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=zygote --no-zygote-sandbox --no-sandbox --headless --headless
rgrzeszi 12091  0.0  0.0 167868 26848 pts/0    S+   11:47   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=zygote --no-sandbox --headless --headless
rgrzeszi 12104  0.0  0.0 216124 37884 pts/0    Sl+  11:47   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=gpu-process --field-trial-handle=11814550139442659559,15247937476695373268,131072 --no-sandbox --disable-breakpad --headless --ozone-platform=headless --headless --gpu-preferences=OAAAAAAAAAAgAAAgAAAAAAAAAAAAAAAAAABgAAAAAAAYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAA== --use-gl=swiftshader-webgl --override-use-software-gl-for-tests --shared-files
rgrzeszi 12105  0.2  0.0 257008 42640 pts/0    Sl+  11:47   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=utility --field-trial-handle=11814550139442659559,15247937476695373268,131072 --lang=en-US --service-sandbox-type=network --no-sandbox --use-gl=swiftshader-webgl --headless --shared-files
rgrzeszi 12106  7.0  0.0 4665756 97004 pts/0   Sl+  11:47   0:02 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=renderer --no-sandbox --allow-pre-commit-input --disable-breakpad --ozone-platform=headless --field-trial-handle=11814550139442659559,15247937476695373268,131072 --disable-databases --disable-gpu-compositing --lang=en-US --headless --num-raster-threads=4 --enable-main-frame-before-activation --renderer-client-id=3 --shared-files
rgrzeszi 12266  0.0  0.0  12888  3044 pts/0    S+   11:48   0:00 /bin/bash /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/kaleido plotly --disable-gpu
rgrzeszi 12271  1.3  0.0 330820 57508 pts/0    Sl+  11:48   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --no-sandbox --allow-file-access-from-files --disable-breakpad --disable-gpu plotly
rgrzeszi 12273  0.0  0.0 167868 27084 pts/0    S+   11:48   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=zygote --no-zygote-sandbox --no-sandbox --headless --headless
rgrzeszi 12274  0.0  0.0 167868 27332 pts/0    S+   11:48   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=zygote --no-sandbox --headless --headless
rgrzeszi 12286  0.1  0.0 216124 36500 pts/0    Sl+  11:48   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=gpu-process --field-trial-handle=7050013125018487994,7150407316231483164,131072 --no-sandbox --disable-breakpad --headless --ozone-platform=headless --headless --gpu-preferences=OAAAAAAAAAAgAAAgAAAAAAAAAAAAAAAAAABgAAAAAAAYAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAIAAAAAAAAAA== --use-gl=swiftshader-webgl --override-use-software-gl-for-tests --shared-files
rgrzeszi 12287  0.3  0.0 257032 41764 pts/0    Sl+  11:48   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=utility --field-trial-handle=7050013125018487994,7150407316231483164,131072 --lang=en-US --service-sandbox-type=network --no-sandbox --use-gl=swiftshader-webgl --headless --shared-files
rgrzeszi 12288  9.6  0.0 4661808 96432 pts/0   Sl+  11:48   0:02 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --type=renderer --no-sandbox --allow-pre-commit-input --disable-breakpad --ozone-platform=headless --field-trial-handle=7050013125018487994,7150407316231483164,131072 --disable-databases --disable-gpu-compositing --lang=en-US --headless --num-raster-threads=4 --enable-main-frame-before-activation --renderer-client-id=4 --shared-files
rgrzeszi 12486  0.0  0.0  12888  3144 pts/0    S+   11:48   0:00 /bin/bash /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/kaleido plotly --disable-gpu
rgrzeszi 12491  2.4  0.0 331856 57220 pts/0    Sl+  11:48   0:00 /home/rgrzeszi/venv-py3.7/lib/python3.7/site-packages/kaleido/executable/bin/kaleido --no-sandbox --allow-file-access-from-files --disable-breakpad --disable-gpu plotly

Following the workaround here:
#42

I implemented a call which forcefully shuts down kaleido:

scope = PlotlyScope()
with open(path, 'wb') as f:
     f.write(scope.transform(fig, format=export_format))
# Shutdown kaleido subprocess to free memory, it will
# be started again on next image export request
# https://github.com/plotly/Kaleido/issues/42
scope._shutdown_kaleido()

This partially solved the issue at hand. However I can now observe the following behavior. Depending on the time when I shutdown kaleido I run into a deadlock situation with the collection of the standard error sooner or later:

def _collect_standard_error(self):
"""
Write standard-error of subprocess to the _std_error StringIO buffer.
Intended to be called once in a background thread
"""
while True:
    if self._proc is not None:
        val = self._proc.stderr.readline()
        self._std_error.write(val)

My current workaround is to break the condition if the process is None.

while True:
    if self._proc is not None:
        val = self._proc.stderr.readline()
        self._std_error.write(val) 
    else:
        break

Any help / feedback would be appreciated.

The text was updated successfully, but these errors were encountered:

jonmmease · 2020-09-29T11:37:57Z

Thanks for the report and the deadlock PR fix in #50.

Regarding the process build up, can you tell whether:

Duplicate processes are showing up during the execution of a single Python/kaleido instance.
Process are not being cleaned up when the Python process exits and them more are created when a new Python/kaleido instance is launched.

The memory leak fix in #43 involves periodically reloading the headless Chromium tab that kaleido uses, and if you're seeing (1) above, it would be helpful to know if this makes any difference for you.

You can install the alpha build of kaleido that has this fix with:

https://github.com/plotly/Kaleido/releases/download/v0.1.0a2/kaleido-0.1.0a2-py2.py3-none-manylinux1_x86_64.whl

If (2), do you know if the Python process that's driving kaleido is always exiting cleanly (without crashing)? The chromium process should be shut down when Python exits and calls the __del__ method on the base scope, but something might be going on that's causing this to not get called.

Thanks!

rgrzeszi · 2020-09-29T12:26:56Z

Hello Jon,

it's (1) a single python instance which runs a data analysis and generates quite a bunch of plots.

A method in a plotting class is called multiple times, in which the Kaleido Scope is created as shown above. With every write a new instance is spawned but it seems at least some of them do not terminate correctly. In my understanding the scope should be created within the method and when leaving the method del would implicitly be called which should then call the _kaleido_shutdown and would avoid the deadlock issue. However, it seemed that this is note the case. I would have to run more experiments on this.

I cannot pinpoint it to a single call, but it seems that simpler plots may not cause this issue (i.e. a simple pie plot). I do visualize more complex things like heatmaps on larger background images (3-4 Megapixel). I assume that the process does not terminate correctly in these cases.

jonmmease · 2020-09-29T13:29:14Z

A method in a plotting class is called multiple times, in which the Kaleido Scope is created as shown above. With every write a new instance is spawned but it seems at least some of them do not terminate correctly. In my understanding the scope should be created within the method and when leaving the method del would implicitly be called which should then call the _kaleido_shutdown and would avoid the deadlock issue. However, it seemed that this is note the case. I would have to run more experiments on this.

Ok, this does actually make sense. The __del__ method isn't guaranteed to be called when the method exits (https://docs.python.org/3/reference/datamodel.html?highlight=__del__#object.__del__). So it's not too surprising that the chromium subprocesses build up with this workflow. It's possible that that thread watching standard error is preventing the reference count of the scope from dropping to zero, but that would just be a guess.

The workflow that the Kaleido scope is designed for, to this point, is to reuse a single scope repeatedly so that the chromium startup time is only required the first time. Is this architecture possible for you?

The alternative is to make sure that the chromium subprocess shuts down when you are finished exporting images with the scope. We should probably create a public shutdown method and document that this should be called to guarantee that the chromium subprocess is shut down, and we could also make the kaleido scope closable so that you could use it in a context manager like this:

with PlotlyScope(...) as scope:
    # Chromium subprocess launched
    scope.transform()

# Chromium subprocess shut down

rgrzeszi · 2020-09-29T13:33:09Z

I believe I tried creating a single scope and it had the same issue, but I will confirm this.

rgrzeszi · 2020-09-29T13:50:48Z

You were absolutely right, the del has not been called and the subprocesses did build up due to this fact. Strangely enough this does not happen on all machines. I had to do some rewriting to really create a single scope but that seems to solve the issue and thus also avoids the infinite loop in the error handler (as I no longer call _shutdown_kaleido manually) - thanks!.

jonmmease · 2020-09-29T15:39:48Z

Thanks for reporting back @rgrzeszi. Glad it's working for you now! I'll still get your PR in, and consider where to document this potential pitfall.

jonmmease · 2021-01-05T11:00:27Z

Alright to close this @rgrzeszi?

In hope that it stops kaleido's memory leak problem as discussed in: plotly/Kaleido#49 plotly/Kaleido#42

rgrzeszi mentioned this issue Sep 29, 2020

[fix] concurrency issue in standard error handling #50

Closed

jonmmease added documentation written for humans feature something new labels Sep 29, 2020

jonmmease closed this as completed May 1, 2021

Haydeni0 added a commit to Haydeni0/pi-humidity that referenced this issue Mar 26, 2023

Add explicit shutdown of kaleido scope

15114ad

In hope that it stops kaleido's memory leak problem as discussed in: plotly/Kaleido#49 plotly/Kaleido#42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Kaleido Zombie Processes and standard error handling #49

Kaleido Zombie Processes and standard error handling #49

rgrzeszi commented Sep 29, 2020

jonmmease commented Sep 29, 2020

rgrzeszi commented Sep 29, 2020

jonmmease commented Sep 29, 2020

rgrzeszi commented Sep 29, 2020

rgrzeszi commented Sep 29, 2020

jonmmease commented Sep 29, 2020

jonmmease commented Jan 5, 2021

Kaleido Zombie Processes and standard error handling #49

Kaleido Zombie Processes and standard error handling #49

Comments

rgrzeszi commented Sep 29, 2020

jonmmease commented Sep 29, 2020

rgrzeszi commented Sep 29, 2020

jonmmease commented Sep 29, 2020

rgrzeszi commented Sep 29, 2020

rgrzeszi commented Sep 29, 2020

jonmmease commented Sep 29, 2020

jonmmease commented Jan 5, 2021