You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
It happens that when users use non-semantically-versioned environment images such as myenviroment:latest or myenvironment:master, and they update this image using the same image tag, the cluster nodes won't pull the new version because of the usual ifNotPreset image pull policy.
It can then happen that some cluster nodes have "old" version of the image, while other cluster nodes have "new" version of the image, leading to seemingly random workflow run failures.
Currently, it is not easy to detect these situations by the user, because REANA does not expose in the job logs which image sha1 was exactly used for the job. The cluster administrators can check and rectify this easily by removing images on the nodes, which forces re-pull of the image for the next run. For example by running the following one-liner:
We could perhaps even consider exposing the node name where the job runs, which could be useful in forensics such as CephFS CSI plugins being down on some nodes etc.
The text was updated successfully, but these errors were encountered:
suggestion: Or we can change ifNotPresent to Always. k8s will compare image digest (hash) and if it is cached locally, it will use the local image, if it is not cached or digests are different, it will pull a new image from the registry (docs).
If Always is used, it will, probably, add overhead to k8s nodes of querying a registry to check if a cached image is the same as one in the registry (one HTTP request, I guess). Not sure how much this will affect the pod starting time.
But regarding adding an image tag and digest to logs, I think, it is a good idea overall. Not quite sure about exposing the node names as it can, potentially, be a security issue (?).
Always will bring some overhead, which may be considerable in case of multi-GiB-large particle physics images... Hence we opted for IfNotPresent as default, together with promoting semantic versioning of docker images, which is the best for ensuring reproducitbility anyway! The reana-client validate also checks for the most comonly used latest, but it doesn't get everything.
So yes, hopefully we can stay on IfNotPresent... But switching to Always via helm values is always an option.
Current behaviour
It happens that when users use non-semantically-versioned environment images such as
myenviroment:latest
ormyenvironment:master
, and they update this image using the same image tag, the cluster nodes won't pull the new version because of the usualifNotPreset
image pull policy.It can then happen that some cluster nodes have "old" version of the image, while other cluster nodes have "new" version of the image, leading to seemingly random workflow run failures.
Currently, it is not easy to detect these situations by the user, because REANA does not expose in the job logs which image sha1 was exactly used for the job. The cluster administrators can check and rectify this easily by removing images on the nodes, which forces re-pull of the image for the next run. For example by running the following one-liner:
$ for node in $(kubectl get nodes -l reana.io/system=runtimejobs | awk '{print $1;}'); do ssh -q -i ~/.ssh/myaccount.pem -o StrictHostKeyChecking=no core@$node 'sudo crictl rmi myenvironment:latest'; done
Howewer, we can perhaps do something better to help the users.
Expected behaviour
Ideally we should display in the job logs that the job was run using image
myenvironment:latest
withsha1
of such and such value:We could perhaps even consider exposing the node name where the job runs, which could be useful in forensics such as CephFS CSI plugins being down on some nodes etc.
The text was updated successfully, but these errors were encountered: