Skip to content

Admission webhook can not map to correct AFU ID #373

Closed
@xxinran

Description

@xxinran

Hi guys, I am trying Intel FPGA Device PlugIn with A10 FPGA.
The FPGA resource has report correctly, I checked it by kubectl describe nodes 127.0.0.1 | grep -A4 Alloc, the output is :

Allocatable:
  cpu:                                                 32
  ephemeral-storage:                                   3188670572180
  fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18:  1
  hugepages-1Gi:                                       4Gi

I have succeed run a pod with this yaml file:

apiVersion: v1
kind: Pod
metadata:
  name: test-nlb3
spec:
  containers:
  - name: test-nlb3
    image: intel/opae-nlb-demo:devel
    imagePullPolicy: IfNotPresent
    command: ['nlb3']
    securityContext:
      capabilities:
        add:
          [IPC_LOCK]
    resources:
      limits:
        fpga.intel.com/af-f7df405cbd7acf7222f144b0b93acd18: 1
        cpu: 1
        hugepages-2Mi: 20Mi
  restartPolicy: Never

But if I changed the resources section to:

    resources:
      limits:
        fpga.intel.com/arria10.dcp1.0-nlb3: 1
        cpu: 1
        hugepages-2Mi: 20Mi

The pod will be pending with an schedule error.
The related logs are the following:

Run kubectl logs intel-fpga-webhook-deployment-7f6b8474c-nd64g

2020/04/29 12:15:55 http: TLS handshake error from 172.17.0.1:23786: remote error: tls: bad certificate

Run tail -f kube-apiserver.log:

failed calling webhook "fpga.mutator.webhooks.intel.com": Post "https://intel-fpga-webhook-svc.default.svc:443/pods?timeout=30s": x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "ca")

Run kubectl describe pod test-nlb3:


Events:
  Type     Reason            Age   From               Message
  ----     ------            ----  ----               -------
  Warning  FailedScheduling  6s    default-scheduler  0/1 nodes are available: 1 Insufficient fpga.intel.com/arria10.dcp1.0-nlb3.
  Warning  FailedScheduling  6s    default-scheduler  0/1 nodes are available: 1 Insufficient fpga.intel.com/arria10.dcp1.0-nlb3.

Also, I wonder the notion of DCP, I noticed there are 3 version of DCPs, I have tried each of them, and had always the same error. What does DCP exactly mean?

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions