Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

podman container restore - Error: runc: create criu restore mount for /usr/lib64/libEGL_nvidia.so.560.35.03 mount #21

Closed
ezerk opened this issue Nov 27, 2024 · 2 comments

Comments

@ezerk
Copy link

ezerk commented Nov 27, 2024

This issue occur when creating the checkpoint image on one machine and restoring it on another with different nvidia-driver version.

checkpoint/restore works well on the same host.

when trying on a different host:

$ sudo podman container restore --print-stats --tcp-established  --ignore-volumes gcr.io/ltx-research/test-comfy-checkpoint:0.0.2

Error: runc: create criu restore mount for /usr/lib64/libEGL_nvidia.so.560.35.03 mount: bind mount source stat: stat /usr/lib64/libEGL_nvidia.so.560.35.03: no such file or directory: OCI runtime attempted to invoke a command that was not found

on the new host (restore) there is a slightly higher nvidia-driver version - 560.35.05 while on previous (checkpoint) version is 560.35.03
so filename is /usr/lib64/libEGL_nvidia.so.560.35.05

expected behaviour would be that this info will be taken from restore host /etc/cdi/nvidia.yaml relating to current host

this coupling pose strict limitation and nvidia upgrades that will require recreating checkpoint images

trying my luck to workaround it using symbolic links (🙈 ..) i made some progress, but eventually i had to remove and install the exact same version to make it work.

if this is not the correct place, please advice the right location for this issue (issue might be in criu or podman)

version details
$ podman --version
podman version 5.2.3

$ runc --version
runc version 1.1.15
spec: 1.0.2-dev
go: go1.22.5 (Red Hat 1.22.5-2.el9)
libseccomp: 2.5.2

# note - using compiled criu version 
$ criu --version
Version: 4.0
GitID: v4.0-23-gf6baf8143
@rst0git
Copy link

rst0git commented Nov 27, 2024

@ezerk A similar question was asked in the following issue: #18 (comment)

The driver version to restore to for CUDA has to be the same.

@ezerk
Copy link
Author

ezerk commented Nov 27, 2024

thanks for the fast reply !
closing this one

@ezerk ezerk closed this as not planned Won't fix, can't repro, duplicate, stale Nov 27, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants