You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This issue occur when creating the checkpoint image on one machine and restoring it on another with different nvidia-driver version.
checkpoint/restore works well on the same host.
when trying on a different host:
$ sudo podman container restore --print-stats --tcp-established --ignore-volumes gcr.io/ltx-research/test-comfy-checkpoint:0.0.2
Error: runc: create criu restore mount for /usr/lib64/libEGL_nvidia.so.560.35.03 mount: bind mount source stat: stat /usr/lib64/libEGL_nvidia.so.560.35.03: no such file or directory: OCI runtime attempted to invoke a command that was not found
on the new host (restore) there is a slightly higher nvidia-driver version - 560.35.05 while on previous (checkpoint) version is 560.35.03
so filename is /usr/lib64/libEGL_nvidia.so.560.35.05
expected behaviour would be that this info will be taken from restore host /etc/cdi/nvidia.yaml relating to current host
this coupling pose strict limitation and nvidia upgrades that will require recreating checkpoint images
trying my luck to workaround it using symbolic links (🙈 ..) i made some progress, but eventually i had to remove and install the exact same version to make it work.
if this is not the correct place, please advice the right location for this issue (issue might be in criu or podman)
version details
$ podman --version
podman version 5.2.3
$ runc --version
runc version 1.1.15
spec: 1.0.2-dev
go: go1.22.5 (Red Hat 1.22.5-2.el9)
libseccomp: 2.5.2
# note - using compiled criu version
$ criu --version
Version: 4.0
GitID: v4.0-23-gf6baf8143
The text was updated successfully, but these errors were encountered:
This issue occur when creating the checkpoint image on one machine and restoring it on another with different nvidia-driver version.
checkpoint/restore works well on the same host.
when trying on a different host:
on the new host (restore) there is a slightly higher nvidia-driver version -
560.35.05
while on previous (checkpoint) version is560.35.03
so filename is
/usr/lib64/libEGL_nvidia.so.560.35.05
expected behaviour would be that this info will be taken from restore host
/etc/cdi/nvidia.yaml
relating to current hostthis coupling pose strict limitation and nvidia upgrades that will require recreating checkpoint images
trying my luck to workaround it using symbolic links (🙈 ..) i made some progress, but eventually i had to remove and install the exact same version to make it work.
if this is not the correct place, please advice the right location for this issue (issue might be in criu or podman)
version details
The text was updated successfully, but these errors were encountered: