Skip to content

xpumanager sidecar: verify certificate with HTTPS #1816

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 1 commit into from
Aug 30, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 34 additions & 42 deletions cmd/xpumanager_sidecar/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -23,7 +23,7 @@ Intel GPUs can be interconnected via an XeLink. In some workloads it is benefici
| -startup-delay | int | 10 | Startup delay before the first topology fetching (seconds, >= 0) |
| -label-namespace | string | gpu.intel.com | Namespace or prefix for the labels. i.e. **gpu.intel.com**/xe-links |
| -allow-subdeviceless-links | bool | false | Include xelinks also for devices that do not have subdevices |
| -use-https | bool | false | Use HTTPS protocol when connecting to XPU Manager |
| -cert | string | "" | Use HTTPS and verify server's endpoint |

The sidecar also accepts a number of other arguments. Please use the -h option to see the complete list of options.

Expand All @@ -50,7 +50,7 @@ See [the development guide](../../DEVEL.md) for details if you want to deploy a
Install XPU Manager daemonset with the XeLink sidecar

```bash
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar?ref=<RELEASE_VERSION>'
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http?ref=<RELEASE_VERSION>'
```

Please see XPU Manager Kubernetes files for additional info on [installation](https://github.com/intel/xpumanager/tree/master/deployment/kubernetes).
Expand All @@ -60,7 +60,7 @@ Please see XPU Manager Kubernetes files for additional info on [installation](ht
Use patch to add sidecar into the XPU Manager daemonset.

```bash
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/kustom/kustom_xpumanager.yaml?ref=<RELEASE_VERSION>'
$ kubectl patch daemonsets.apps intel-xpumanager --patch-file 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/http/xpumanager.yaml?ref=<RELEASE_VERSION>'
```

NOTE: The sidecar patch will remove other resources from the XPU Manager container. If your XPU Manager daemonset is using, for example, the smarter device manager resources, those will be removed.
Expand All @@ -76,7 +76,25 @@ master,0.0-1.0_0.1-1.1

### Use HTTPS with XPU Manager

XPU Manager can be configured to use HTTPS on the metrics interface. For the gunicorn sidecar, cert and key files have to be added to the command:
There is an alternative deployment that uses HTTPS instead of HTTP. The reference deployment requires `cert-manager` to provide a certificate for TLS. To deploy:

```bash
$ kubectl apply -k 'https://github.com/intel/intel-device-plugins-for-kubernetes/deployments/xpumanager_sidecar/overlays/cert-manager?ref=<RELEASE_VERSION>'
```

The deployment requests a certificate and key from `cert-manager`. They are then provided to the gunicorn container as secrets and are used in the HTTPS interface. The sidecar container uses the same certificate to verify the server.

> *NOTE*: The HTTPS deployment uses self-signed certificates. For production use, the certificates should be properly set up.

<details>
<summary>Enabling HTTPS manually</summary>

If one doesn't want to use `cert-manager`, the same can be achieved manually by creating certificates with openssl and then adding it to the deployment. The steps are roughly:
1) Create a certificate with [openssl](https://www.linode.com/docs/guides/create-a-self-signed-tls-certificate/)
1) Create a secret from the [certificate & key](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/).
1) Change the deployment:

* Add certificate and key to gunicorn container:
```
- command:
- gunicorn
Expand All @@ -87,8 +105,7 @@ XPU Manager can be configured to use HTTPS on the metrics interface. For the gun
- xpum_rest_main:main()
```

The gunicorn container will also need the tls.crt and tls.key files within the container. For example:

* Add secret mounting to the Pod:
```
containers:
- name: python-exporter
Expand All @@ -101,44 +118,19 @@ The gunicorn container will also need the tls.crt and tls.key files within the c
secret:
defaultMode: 420
secretName: xpum-server-cert
```

In this case, the secret providing the certificate and key is called `xpum-server-cert`.

The certificate and key can be [added manually to a secret](https://kubernetes.io/docs/reference/kubectl/generated/kubectl_create/kubectl_create_secret_tls/). Another way to achieve a secret is to leverage [cert-manager](https://cert-manager.io/).

<details>
<summary>Example for the Cert-manager objects</summary>

Cert-manager will create a self-signed certificate and the private key, and store them into a secret called `xpum-server-cert`.
```

* Add use-https and cert to sidecar
```
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: serving-cert
spec:
dnsNames:
- xpum.svc
- xpum.svc.cluster.local
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: xpum-server-cert
name: xelink-sidecar
volumeMounts:
- mountPath: /certs
name: certs
readOnly: true
args:
...
- --cert=/certs/tls.crt
...
```

</details>

For the XPU Manager sidecar, `use-https` has to be added to the arguments. Then the sidecar will leverage HTTPS with the connection to the metrics interface.
```
args:
- -v=2
- -use-https
```
36 changes: 28 additions & 8 deletions cmd/xpumanager_sidecar/main.go
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,7 @@ import (
"bytes"
"context"
"crypto/tls"
"crypto/x509"
"flag"
"fmt"
"io"
Expand Down Expand Up @@ -61,12 +62,12 @@ type xpuManagerSidecar struct {
dstFilePath string
labelNamespace string
url string
certFile string
interval uint64
startDelay uint64
xpumPort uint64
laneCount uint64
allowSubdevicelessLinks bool
useHTTPS bool
}

func (e *invalidEntryErr) Error() string {
Expand All @@ -78,12 +79,30 @@ func (xms *xpuManagerSidecar) getMetricsDataFromXPUM() []byte {
Timeout: 5 * time.Second,
}

if xms.useHTTPS {
customTransport := http.DefaultTransport.(*http.Transport).Clone()
//#nosec
customTransport.TLSClientConfig = &tls.Config{InsecureSkipVerify: true}
if len(xms.certFile) > 0 {
cert, err := os.ReadFile(xms.certFile)
if err != nil {
klog.Warning("Failed to read cert: ", err)

return nil
}

client.Transport = customTransport
certPool := x509.NewCertPool()
if !certPool.AppendCertsFromPEM(cert) {
klog.Warning("Adding server cert to pool failed")

return nil
}

tr := &http.Transport{
TLSClientConfig: &tls.Config{
MinVersion: tls.VersionTLS12,
RootCAs: certPool,
ServerName: "127.0.0.1",
},
}

client.Transport = tr
}

ctx := context.Background()
Expand Down Expand Up @@ -380,7 +399,7 @@ func main() {
flag.Uint64Var(&xms.laneCount, "lane-count", 4, "minimum lane count for xelink")
flag.StringVar(&xms.labelNamespace, "label-namespace", "gpu.intel.com", "namespace for the labels")
flag.BoolVar(&xms.allowSubdevicelessLinks, "allow-subdeviceless-links", false, "allow xelinks that are not tied to subdevices (=1 tile GPUs)")
flag.BoolVar(&xms.useHTTPS, "use-https", false, "Use HTTPS protocol to connect to xpumanager")
flag.StringVar(&xms.certFile, "cert", "", "Use HTTPS and verify server's endpoint")
klog.InitFlags(nil)

flag.Parse()
Expand All @@ -390,7 +409,8 @@ func main() {
}

protocol := "http"
if xms.useHTTPS {

if len(xms.certFile) > 0 {
protocol = "https"
}

Expand Down
2 changes: 2 additions & 0 deletions deployments/xpumanager_sidecar/base/kustomization.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,2 @@
resources:
- https://github.com/intel/xpumanager/deployment/kubernetes/daemonset/base/?ref=V1.2.38
7 changes: 0 additions & 7 deletions deployments/xpumanager_sidecar/kustomization.yaml

This file was deleted.

20 changes: 20 additions & 0 deletions deployments/xpumanager_sidecar/overlays/cert-manager/certs.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
apiVersion: cert-manager.io/v1
kind: Issuer
metadata:
name: selfsigned-issuer
spec:
selfSigned: {}
---
apiVersion: cert-manager.io/v1
kind: Certificate
metadata:
name: serving-cert
spec:
ipAddresses:
- "127.0.0.1"
privateKey:
rotationPolicy: Always
issuerRef:
kind: Issuer
name: selfsigned-issuer
secretName: xpum-server-cert
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
resources:
- ../../base
- certs.yaml
namespace: monitoring
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
patches:
- path: xpumanager.yaml
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
apiVersion: apps/v1
kind: DaemonSet
metadata:
labels:
app: intel-xpumanager
name: intel-xpumanager
spec:
template:
spec:
volumes:
- name: features-d
hostPath:
path: "/etc/kubernetes/node-feature-discovery/features.d/"
- name: xpum-cert
secret:
secretName: xpum-server-cert
containers:
- name: python-exporter
volumeMounts:
- name: xpum-cert
mountPath: "/cert"
command:
- gunicorn
- --bind
- 0.0.0.0:29999
- --worker-connections
- "64"
- --worker-class
- gthread
- --workers
- "1"
- --threads
- "4"
- --keyfile=/cert/tls.key
- --certfile=/cert/tls.crt
- xpum_rest_main:main()
startupProbe:
httpGet:
scheme: HTTPS
livenessProbe:
httpGet:
scheme: HTTPS
- name: xelink-sidecar
image: intel/intel-xpumanager-sidecar:devel
imagePullPolicy: IfNotPresent
args:
- -v=2
- --cert=/cert/tls.crt
volumeMounts:
- name: features-d
mountPath: "/etc/kubernetes/node-feature-discovery/features.d/"
- name: xpum-cert
mountPath: "/cert"
securityContext:
allowPrivilegeEscalation: false
capabilities:
drop:
- ALL
readOnlyRootFilesystem: true
runAsUser: 0
Original file line number Diff line number Diff line change
@@ -0,0 +1,7 @@
resources:
- ../../base
namespace: monitoring
apiVersion: kustomize.config.k8s.io/v1beta1
kind: Kustomization
patches:
- path: xpumanager.yaml
Original file line number Diff line number Diff line change
Expand Up @@ -14,7 +14,7 @@ spec:
containers:
- name: xelink-sidecar
image: intel/intel-xpumanager-sidecar:devel
imagePullPolicy: Always
imagePullPolicy: IfNotPresent
args:
- -v=2
volumeMounts:
Expand Down
Loading