Eliminate FPGA admission webhook's mode #301

rojkov · 2020-02-24T16:22:53Z

Problem:

It is possible to run the FPGA device plugin in two different modes on different nodes of the same cluster. Yet the admission webhook can be aligned to work with FPGA device plugins in either preprogrammed or orchestrated mode. The webhook needs to be redesigned to be agnostic about the device plugins' modes. Also when operating in preprogrammed mode it is impossible to differentiate nodes providing the same accelerated function with different hardware, e.g. a user's request to dispatch a task onto stratix10-dcp1.0-nlb0 may well be dispatched to a node running the nlb0 accelerated function on an Aria10.

Solution:

Modify the FPGA plugin to expose both AF and interface IDs in resource names in "preprogrammed" mode (currently only AF ID is exposed).
Modify AcceleratedFunction CRDs to contain info on the hardware the accelerated function is intended to run on (interface ID) and the required mode of the FPGA plugin. So that a cluster admin would have the possibility to configure the webhook to operate with plugins in both modes.

apiVersion: apiextensions.k8s.io/v1beta1
kind: CustomResourceDefinition
metadata:
  name: acceleratorfunctions.fpga.intel.com
spec:
  group: fpga.intel.com
  version: v1
  scope: Namespaced
  names:
    plural: acceleratorfunctions
    singular: acceleratorfunction
    kind: AcceleratorFunction
    shortNames:
    - af
  validation:
    openAPIV3Schema:
      properties:
        spec:
          properties:
            afuId:
              type: string
              pattern: '^[0-9a-f]{8,128}$'
            interfaceId:
              type: string
              pattern: '^[0-9a-f]{8,128}$'
            mode:
              type: string
              pattern: '^preprogrammed|orchestrated$'

Modify the webhook not to accept -mode option and to translate requested resources using AcceleratedFunction CRDs only in the new format.

The format of resource names visible to a user is not changed. Basically the format can be anything, but it's expected to be in the form <hardware>-<firmware_release>-<accelerated_function>, e.g. arria10-dcp1.1-nlb0.

The text was updated successfully, but these errors were encountered:

rojkov · 2020-04-06T09:10:11Z

So, in this patch I made the FPGA plugin expose AFs as fpga.intel.com/<interface_id><afu_id> to make AFUs provided by different HW distinguishable.

The problem though is that such resource name is 64 bytes long (32 + 32). Whereas the max resource name length without namespace is 63:

• Failure [6.023 seconds]
FPGA Admission Webhook
/home/rojkov/work/intel-device-plugins-for-kubernetes/test/e2e/fpgaadmissionwebhook/fpgaadmissionwebhook.go:36
  mutates created pods to reference resolved AFs [It]
  /home/rojkov/work/intel-device-plugins-for-kubernetes/test/e2e/fpgaadmissionwebhook/fpgaadmissionwebhook.go:51

  pod Create API error
  Unexpected error:
      <*errors.StatusError | 0xc0004a4320>: {
          ErrStatus: {
              TypeMeta: {Kind: "", APIVersion: ""},
              ListMeta: {
                  SelfLink: "",
                  ResourceVersion: "",
                  Continue: "",
                  RemainingItemCount: nil,
              },
              Status: "Failure",
              Message: "Pod \"webhook-tester\" is invalid: [spec.containers[0].resources.limits[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": name part must be no more than 63 characters, spec.containers[0].resources.limits[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": doesn't follow extended resource name standard, spec.containers[0].resources.requests[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": name part must be no more than 63 characters, spec.containers[0].resources.requests[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": doesn't follow extended resource name standard]",
              Reason: "Invalid",
              Details: {
                  Name: "webhook-tester",
                  Group: "",
                  Kind: "Pod",
                  UID: "",
                  Causes: [
                      {
                          Type: "FieldValueInvalid",
                          Message: "Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": name part must be no more than 63 characters",
                          Field: "spec.containers[0].resources.limits[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]",
                      },
                      {
                          Type: "FieldValueInvalid",
                          Message: "Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": doesn't follow extended resource name standard",
                          Field: "spec.containers[0].resources.limits[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]",
                      },
                      {
                          Type: "FieldValueInvalid",
                          Message: "Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": name part must be no more than 63 characters",
                          Field: "spec.containers[0].resources.requests[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]",
                      },
                      {
                          Type: "FieldValueInvalid",
                          Message: "Invalid value: \"fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18\": doesn't follow extended resource name standard",
                          Field: "spec.containers[0].resources.requests[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]",
                      },
                  ],
                  RetryAfterSeconds: 0,
              },
              Code: 422,
          },
      }
      Pod "webhook-tester" is invalid: [spec.containers[0].resources.limits[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: "fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18": name part must be no more than 63 characters, spec.containers[0].resources.limits[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: "fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18": doesn't follow extended resource name standard, spec.containers[0].resources.requests[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: "fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18": name part must be no more than 63 characters, spec.containers[0].resources.requests[fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18]: Invalid value: "fpga.intel.com/bfac4d851ee856fe8c95865ce1bbaa2df7df405cbd7acf7222f144b0b93acd18": doesn't follow extended resource name standard]
  occurred

  /home/rojkov/work/intel-device-plugins-for-kubernetes/test/e2e/fpgaadmissionwebhook/fpgaadmissionwebhook.go:95

/cc @kad @bart0sh Do you mind if I remove the last character of FPGA interface IDs when exposing AF resources?

bart0sh · 2020-04-06T09:13:26Z

I don't. At least for the POC version.

msivosuo mentioned this issue Mar 10, 2020

Finalize FPGA kustomize #318

Closed

4 tasks

msivosuo assigned rojkov Mar 11, 2020

rojkov mentioned this issue Apr 6, 2020

fpga: make admission webhook mode-less #358

Merged

bart0sh closed this as completed in #358 May 5, 2020

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Eliminate FPGA admission webhook's mode #301

Eliminate FPGA admission webhook's mode #301

rojkov commented Feb 24, 2020 •

edited

Loading

rojkov commented Apr 6, 2020

bart0sh commented Apr 6, 2020 •

edited

Loading

Eliminate FPGA admission webhook's mode #301

Eliminate FPGA admission webhook's mode #301

Comments

rojkov commented Feb 24, 2020 • edited Loading

rojkov commented Apr 6, 2020

bart0sh commented Apr 6, 2020 • edited Loading

rojkov commented Feb 24, 2020 •

edited

Loading

bart0sh commented Apr 6, 2020 •

edited

Loading