C2D.2: things to consider from old issues #429

mihaisc · 2024-05-14T09:13:43Z

Check pod queue in case of lack of resources
How to deal with ErrImagePull for algo
Kubernetes privileges

Right now, we are using db-operator with ClusterAdmin role, which is overkill.
Need to loop through the code, see exactly what roles do we need and update binding.yaml

enforce validUntil if it does not exists
Algorithm job OOMKilled will forever deserted and block further new job from the same wallet

Describe the bug
As an algorithm jobs OOMKilled because allocated memory were not enough for the whole process, the algorithm job will be deserted in the namespace.
Associated configmap and other kubernetes objects also not removed.
Database table jobs, column status will forever stays as 40: Running algorithm
Thus, when operator-engine trigger db function announce_and_get_sql_pending_jobs, will always return the OOMKilled algorithm job, and no new job from this wallet can be started.

To Reproduce
Steps to reproduce the behavior:

Setup operator-engine with with env var configure to be nCPU: 1 and ramGB: 1
Publish algorithm and dataset that will run more than 10min and progressively use extra memory
Order and start the compute job
Expected behavior
Jobs pod killed gracefully and next subsequent job will able to be run.

Taints selector for pods
Zip outputs folder

Right now, pod-publishing can upload files in /data/outputs/, but not file in sub-folders (ie: /data/outputs/results/1.jpg)

Also, if you have a lot of files, it's cumbersome to download them one by one.

Let's zip entire /data/outputs/ and upload it to storage, so user has to download only one file

mihaisc added the Type: Enhancement New feature or request label May 14, 2024

alexcos20 added this to the New C2D milestone Jul 5, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

C2D.2: things to consider from old issues #429

C2D.2: things to consider from old issues #429

mihaisc commented May 14, 2024 •

edited

Loading

C2D.2: things to consider from old issues #429

C2D.2: things to consider from old issues #429

Comments

mihaisc commented May 14, 2024 • edited Loading

mihaisc commented May 14, 2024 •

edited

Loading