-
Notifications
You must be signed in to change notification settings - Fork 236
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Nvidia node mismatch for pod, pick up:/dev/nvidia6 predicate: /dev/nvidia1, which is unexpected. #37
Comments
It's a defensive mechanism for gpu-manager. The gpu-admission try to assign a pod to one card to avoid fragment, but the gpu-admission schedule information is not as new as the gpu-manager knows for some reason(pod terminated, pod failed etc). The gpu-manager will validate whether it's the same as the gpu-admission predicated, if not gpu-manager will reject it to keep the same allocation view. |
Besides, your situation may be another scenario. We're working on this fix. |
Today I try to reproduce the problem. First I create 7 NVIDIA GPU Pods, each occupying 1 GPU.
apiVersion: apps/v1
kind: Deployment
metadata:
name: nvidia-gpu-test-app-time-cost
namespace: danlu-efficiency
spec:
replicas: 7
selector:
matchLabels:
app: nvidia-gpu-test-app-time-cost
template:
metadata:
labels:
app: nvidia-gpu-test-app-time-cost
spec:
schedulerName: gpu-admission
restartPolicy: Always
containers:
- name: nvidia-gpu-test-app-time-cost
image: xxx:gpu-test-app-time-cost
resources:
#requests:
#tencent.com/vcuda-core: "20"
#tencent.com/vcuda-memory: "10"
limits:
nvidia.com/gpu: 1
#tencent.com/vcuda-core: "20"
#tencent.com/vcuda-memory: "10"
imagePullSecrets:
- name: gpu
Then, I create 1 Tencent GPU Pod, occupying 1/5 GPU and 1/4 GPU memory. I got the problem again and this pod always cycles between the two states of pending and UnexpectedAdmissionError.
So I am confused why GPUManager chooses GPU#4. Shouldn't it choose GPU#0 in terms of resource utilization? Is GPU topology considered here? But why consider topology? This latter test program has nothing to do with other programs. |
GPUManager only consider the pod with its specification resource, even if your gpu card was occupied by some programs. Topology is considered because that some program may have p2p data transfer through gpu card by using PS: can you provide the log of chosen result of your situation? |
Hi @mYmNeo , thanks for your quick answer. Actually, in order not to affect the current k8s environment, we created a new scheduler and started admission as its extended scheduler. Here is its description. apiVersion: v1
kind: ServiceAccount
metadata:
name: gpu-admission
namespace: danlu-efficiency
---
apiVersion: rbac.authorization.k8s.io/v1
kind: ClusterRoleBinding
metadata:
name: gpu-admission-cluster-admin
namespace: danlu-efficiency
roleRef:
apiGroup: rbac.authorization.k8s.io
kind: ClusterRole
name: cluster-admin
subjects:
- kind: ServiceAccount
namespace: danlu-efficiency
name: gpu-admission
---
apiVersion: v1
kind: ConfigMap
metadata:
name: gpu-admission-config
namespace: danlu-efficiency
data:
config.yaml: |
apiVersion: kubescheduler.config.k8s.io/v1alpha1
kind: KubeSchedulerConfiguration
schedulerName: gpu-admission
algorithmSource:
policy:
configMap:
namespace: danlu-efficiency
name: gpu-admission-policy
leaderElection:
leaderElect: true
lockObjectName: gpu-admission
lockObjectNamespace: danlu-efficiency
---
apiVersion: v1
kind: ConfigMap
metadata:
name: gpu-admission-policy
namespace: danlu-efficiency
data:
policy.cfg : |
{
"kind" : "Policy",
"apiVersion" : "v1",
"predicates" : [
{"name" : "CheckNodeUnschedulable"},
{"name" : "GeneralPredicates"},
{"name" : "HostName"},
{"name" : "PodFitsHostPorts"},
{"name" : "MatchNodeSelector"},
{"name" : "PodFitsResources"},
{"name" : "NoDiskConflict"},
{"name" : "PodToleratesNodeTaints"},
{"name" : "MaxEBSVolumeCount"},
{"name" : "MaxGCEPDVolumeCount"},
{"name" : "MaxAzureDiskVolumeCount"},
{"name" : "CheckVolumeBinding"},
{"name" : "NoVolumeZoneConflict"},
{"name" : "MatchInterPodAffinity"}
],
"priorities" : [
{"name" : "EqualPriority", "weight" : 1},
{"name" : "MostRequestedPriority", "weight" : 1},
{"name" : "RequestedToCapacityRatioPriority", "weight" : 1},
{"name" : "SelectorSpreadPriority", "weight" : 1},
{"name" : "ServiceSpreadingPriority", "weight" : 1},
{"name" : "InterPodAffinityPriority", "weight" : 1},
{"name" : "LeastRequestedPriority", "weight" : 1},
{"name" : "BalancedResourceAllocation", "weight" : 1},
{"name" : "NodePreferAvoidPodsPriority", "weight" : 1},
{"name" : "NodeAffinityPriority", "weight" : 1},
{"name" : "TaintTolerationPriority", "weight" : 1},
{"name" : "ImageLocalityPriority", "weight" : 1}
],
"extenders" : [
{
"urlPrefix": "http://localhost:3456/scheduler",
"filterVerb": "predicates",
"enableHttps": false,
"nodeCacheCapable": false
}
],
"hardPodAffinitySymmetricWeight" : 10
}
---
apiVersion: apps/v1
kind: Deployment
metadata:
name: gpu-admission
namespace: danlu-efficiency
labels:
app: gpu-admission
spec:
replicas: 1
selector:
matchLabels:
app: gpu-admission
template:
metadata:
labels:
app: gpu-admission
spec:
serviceAccountName: gpu-admission
volumes:
- name: gpu-admission-config
configMap:
name: gpu-admission-config
containers:
- name: gpu-admission-ctr
image: gcr.io/google_containers/hyperkube:v1.13.4
imagePullPolicy: IfNotPresent
args:
- kube-scheduler
- --config=/gpu-admission/config.yaml
- -v=4
volumeMounts:
- name: gpu-admission-config
mountPath: /gpu-admission
- name: gpu-admission-extender-ctr
image: xxx:gpu-admission-v0.1
imagePullPolicy: Always
livenessProbe:
httpGet:
path: /version
port: 3456
readinessProbe:
httpGet:
path: /version
port: 3456
ports:
- containerPort: 3456
imagePullSecrets:
- name: regcred
This is the
This is the
I am eager to know whether this problem is related to the mixed use of NVIDIA GPU Pods, because it will affect whether we can use it in our production environment. But what puzzles me is that some mixed use scenario can run correctly. Looking forward to your reply. |
The gpu-admission doesn't has the view of your |
I don't have NVIDIA/k8s-device-plugin installed, but also got this error |
I met this issue too. I had add some debug logs as below: I0122 04:41:31.480076 13774 tree.go:119] Update device information I wonder why use node[0] particularly when has many cards?
|
After I removed the four lines then it works normally! gpu-manager/pkg/services/allocator/nvidia/allocator.go Lines 452 to 455 in 808ff8c
|
@qifengz it should be not recommended to delete those error checking code, it looks like worked because you can check this by |
Hi @qifengz, @zwpaper has pointed out the reason. If you read the code, you may find that gpu-manager/pkg/algorithm/nvidia/link.go Line 42 in 1d0955c
gpu-manager/pkg/algorithm/nvidia/share.go Line 47 in 1d0955c
Therefore, my approach is to delete these two algorithms so as to be consistent with the |
after delete,does it has some other problem? |
Performance is not very satisfactory under my test. |
@qifengz Your case is the same as mine.
If you look carefully, you will find that the memory of GPU1 ( The code that caused this issue is in gpu-manager/pkg/device/nvidia/sort.go Lines 59 to 61 in 808ff8c
#74 fixed it. |
@fighterhit @HeroBcat Got you, helpful! |
I fix the problem by delete the code,it works fine. Because the admission get the usage from k8s info, so if you don`t check this ,admission can also get the final scheduled info every 30s from k8s. https://github.com/lynnfi/gpu-manager
|
I got a similar problem when I create a pod like issue 18. Please help analyze.
The text was updated successfully, but these errors were encountered: