Cocalc project restarts. How to investigate? #21
Description
We are experimenting with Cocalc (a slightly slimmed image with fewer kernels and with increased memory defaults) for remote teaching/pair programming. (It works pretty well!) I am currently noticing three different types of crashes and would like to get a hint as to how to find out why the crash occurred/how I can fix it/see the logs.
-
Python kernel crashes. Seems to occur when I allocate too much memory in a numpy array for example. The relevant cell gets a red tag with the kernel killed message. All understandable, I can live with that. (Although I wouldn’t mind seeing this somewhere in some project admin/server admin logs.)
-
Project Pod sometimes gets killed. All I see is a Killed event in
kubectl get events
. Doesn’t happen super often, so it is not too bad, but I’d still like to get an idea why. -
Project restarts without notice. Sometimes this happens every 10 minutes while people are working on a project, so it doesn’t seem to be some idle timeout. (I figured it’s not the worst thing that can happen for teaching, as it clears all hidden variables and gives the student a clean state. ;) ) This is the nastiest problem as the reason is very unclear to me and I wouldn’t know where to look (and which limit to increase).
Any hints?