-
Notifications
You must be signed in to change notification settings - Fork 159
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Problems caused by launching multiple pods at the same time #25
Comments
cc @mYmNeo |
As if vcuda would copy lib to container after container start up. When start multiple GPU-resource pods simultaneously, this action is not fast enough. You can try to modify your command |
Oh, Thanks rainfd, I knew this solution, but I felt this way is a hat trick. |
What's the version of gpu-manager? I've fixed a problem in master branch but not released a image |
@mYmNeo my version is v1.0.4. What is the commit? |
|
Why do I get an error when I start multiple GPU-resource pods simultaneously (concurrently) using vcuda?
In vcuda loader.c, I add
ferror
to printerrno
related error message, I get itBut when I start the pods sequentially, I don't have this problem. So I guess it may be caused by a gap between the kubelet startup container and the gpu-manager placing the libcuda.so file.
The text was updated successfully, but these errors were encountered: