-
Notifications
You must be signed in to change notification settings - Fork 1.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Simplify cni config #13407
Merged
Merged
Simplify cni config #13407
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
This change removes the `policy` entry from the cni config template, which isn't used. That contained a `__SERVICEACCOUNT_TOKEN__` placeholder, which was coupling this config file with the `ZZZ-linkerd-cni-kubeconfig` file generated by linkerd-cni. An upcoming PR will add support for detecting changes in the mounted serviceaccount token file (see #12573), and the current change will facilitate that effort.
Currently blocked by a rustc version issue when doing |
alpeb
added a commit
to linkerd/linkerd2-proxy-init
that referenced
this pull request
Nov 29, 2024
Fixes linkerd/linkerd2#12573 ## Problem When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s: ```yaml - name: kube-api-access-729gv projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ``` According to this, the token is set to expire after an hour. When the linkerd-cni pod starts it deploys the file `ZZZ-linkerd-cni-kubeconfig` in to the **host** file system. That config contains the token sourced from `/var/run/secrets/kubernetes.io/serviceaccount` (mounted by the pod). When the token gets rotated after an hour, that token file is updated but `ZZZ-linkerd-cni-kubeconfig` is not updated. The `linkerd-cni` binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config. However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's `--service-account-extend-token-expiration` [flag](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/#options) which is usually set as `true` to avoid breaking too many instances not yet adapted to use tokens with short expirations: > Turns on projected service account expiration extension during token generation, which helps safe transition from legacy token to bound service account token feature. If this flag is enabled, admission injected tokens would be extended up to 1 year to prevent unexpected failure during transition, ignoring value of service-account-max-token-expiration. ## Repro ### AKS The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI. Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" | kubectl apply -f - ``` And install linkerd with cni enabled, and an injected instance of emojivoto. The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the [docs](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#rotate-the-oidc-key) should invalidate the old key. After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised: ``` Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8121291446642b272cea9ee5f083958a37bab0dd7060c4d9c06bb05fecf911d2": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized ``` ## Fix This change adds a new function `monitor_service_account_token()` that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the new `create_kubeconfig()` function. This change also removes the existing logic around the DELETE event, which is a leftover from previous changes and is now a no-op. Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file. Finally, the file `linkerd-cni.conf.default` has been removed as is not used. ## Test Same as with the repro above, but use the cni-plugin image that contains the fix: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" --set image.name="ghcr.io/alpeb/cni-plugin" --set image.version="v1.5.3" | kubectl apply -f - ``` After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.
alpeb
added a commit
to linkerd/linkerd2-proxy-init
that referenced
this pull request
Nov 29, 2024
Fixes linkerd/linkerd2#12573 ## Problem When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s: ```yaml - name: kube-api-access-729gv projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ``` According to this, the token is set to expire after an hour. When the linkerd-cni pod starts it deploys the file `ZZZ-linkerd-cni-kubeconfig` in to the **host** file system. That config contains the token sourced from `/var/run/secrets/kubernetes.io/serviceaccount` (mounted by the pod). When the token gets rotated after an hour, that token file is updated but `ZZZ-linkerd-cni-kubeconfig` is not updated. The `linkerd-cni` binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config. However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's `--service-account-extend-token-expiration` [flag](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/#options) which is usually set as `true` to avoid breaking too many instances not yet adapted to use tokens with short expirations: > Turns on projected service account expiration extension during token generation, which helps safe transition from legacy token to bound service account token feature. If this flag is enabled, admission injected tokens would be extended up to 1 year to prevent unexpected failure during transition, ignoring value of service-account-max-token-expiration. ## Repro ### AKS The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI. Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" | kubectl apply -f - ``` And install linkerd with cni enabled, and an injected instance of emojivoto. The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the [docs](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#rotate-the-oidc-key) should invalidate the old key. After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised: ``` Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8121291446642b272cea9ee5f083958a37bab0dd7060c4d9c06bb05fecf911d2": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized ``` ## Fix This change adds a new function `monitor_service_account_token()` that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the new `create_kubeconfig()` function. This change also removes the existing logic around the DELETE event, which is a leftover from previous changes and is now a no-op. Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file. Finally, the file `linkerd-cni.conf.default` has been removed as is not used. ## Test Same as with the repro above, but use the cni-plugin image that contains the fix: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" --set image.name="ghcr.io/alpeb/cni-plugin" --set image.version="v1.5.3" | kubectl apply -f - ``` After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.
|
olix0r
approved these changes
Dec 4, 2024
alpeb
added a commit
to linkerd/linkerd2-proxy-init
that referenced
this pull request
Dec 6, 2024
Fixes linkerd/linkerd2#12573 ## Problem When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s: ```yaml - name: kube-api-access-729gv projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ``` According to this, the token is set to expire after an hour. When the linkerd-cni pod starts it deploys the file `ZZZ-linkerd-cni-kubeconfig` in to the **host** file system. That config contains the token sourced from `/var/run/secrets/kubernetes.io/serviceaccount` (mounted by the pod). When the token gets rotated after an hour, that token file is updated but `ZZZ-linkerd-cni-kubeconfig` is not updated. The `linkerd-cni` binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config. However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's `--service-account-extend-token-expiration` [flag](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/#options) which is usually set as `true` to avoid breaking too many instances not yet adapted to use tokens with short expirations: > Turns on projected service account expiration extension during token generation, which helps safe transition from legacy token to bound service account token feature. If this flag is enabled, admission injected tokens would be extended up to 1 year to prevent unexpected failure during transition, ignoring value of service-account-max-token-expiration. ## Repro ### AKS The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI. Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" | kubectl apply -f - ``` And install linkerd with cni enabled, and an injected instance of emojivoto. The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the [docs](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#rotate-the-oidc-key) should invalidate the old key. After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised: ``` Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8121291446642b272cea9ee5f083958a37bab0dd7060c4d9c06bb05fecf911d2": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized ``` ## Fix This change adds a new function `monitor_service_account_token()` that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the new `create_kubeconfig()` function. This change also removes the existing logic around the DELETE event, which is a leftover from previous changes and is now a no-op. Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file. Finally, the file `linkerd-cni.conf.default` has been removed as is not used. ## Test Same as with the repro above, but use the cni-plugin image that contains the fix: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" --set image.name="ghcr.io/alpeb/cni-plugin" --set image.version="v1.5.3" | kubectl apply -f - ``` After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.
alpeb
added a commit
to linkerd/linkerd2-proxy-init
that referenced
this pull request
Dec 6, 2024
Fixes linkerd/linkerd2#12573 ## Problem When deployed, the linkerd-cni pod gets its service account token mounted automatically by k8s: ```yaml - name: kube-api-access-729gv projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace ``` According to this, the token is set to expire after an hour. When the linkerd-cni pod starts it deploys the file `ZZZ-linkerd-cni-kubeconfig` in to the **host** file system. That config contains the token sourced from `/var/run/secrets/kubernetes.io/serviceaccount` (mounted by the pod). When the token gets rotated after an hour, that token file is updated but `ZZZ-linkerd-cni-kubeconfig` is not updated. The `linkerd-cni` binary uses that token to connect to the kube-api, so having an outdated token should forbid it from functioning properly, which would manifest as new pods in the data plane not being able to acquire a proper network config. However, that failure isn't usually observed, except for the cases pointed out in linkerd/linkerd2#12573. The reason is that the token's actual lifetime is one year, due to kube-api's `--service-account-extend-token-expiration` [flag](https://kubernetes.io/docs/reference/command-line-tools-reference/kube-apiserver/#options) which is usually set as `true` to avoid breaking too many instances not yet adapted to use tokens with short expirations: > Turns on projected service account expiration extension during token generation, which helps safe transition from legacy token to bound service account token feature. If this flag is enabled, admission injected tokens would be extended up to 1 year to prevent unexpected failure during transition, ignoring value of service-account-max-token-expiration. ## Repro The issue currently affects AKS clusters using OIDC keys. To reproduce, create a new cluster in AKS, making sure "Enable OIDC" and "Workload Identity" is ticked in the UI. Then install the linkerd-cni plugin, labelling the linkerd-cni DaemonSet so that its ServiceAccount token is provided via OIDC: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" | kubectl apply -f - ``` And install linkerd with cni enabled, and an injected instance of emojivoto. The secret token is rotated after an hour, but the old one remains valid for a 24h. Manually rotating the key as detailed in the [docs](https://learn.microsoft.com/en-us/azure/aks/use-oidc-issuer#rotate-the-oidc-key) should invalidate the old key. After that, bouncing any emojivoto pod will prove unsuccessful with the following event being raised: ``` Warning FailedCreatePodSandBox 15s kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "8121291446642b272cea9ee5f083958a37bab0dd7060c4d9c06bb05fecf911d2": plugin type="linkerd-cni" name="linkerd-cni" failed (add): Unauthorized ``` ## Fix This change adds a new function `monitor_service_account_token()` that monitors the rollout of the token file; which is a symlink whose target changes as a new token is deployed. When detecting a new token file, this function calls the new `create_kubeconfig()` function. Also, as detailed in linkerd/linkerd2#13407, the ServiceAccount token has been removed from the cni config template because it's not used, simplifying things as we can regenerate the kubeconfig file without having to touch the cni config file. ## Test Same as with the repro above, but use the cni-plugin image that contains the fix: ``` linkerd install-cni --set-string "podLabels.azure\.workload\.identity/use"="true" --set image.name="ghcr.io/alpeb/cni-plugin" --set image.version="v1.5.3" | kubectl apply -f - ``` After an hour when the token gets rotated you should see the event in the linkerd-cni pod logs.
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
This change removes the
policy
entry from the cni config template, which isn't used. That contained a__SERVICEACCOUNT_TOKEN__
placeholder, which was coupling this config file with theZZZ-linkerd-cni-kubeconfig
file generated by linkerd-cni. In linkerd/linkerd2-proxy-init#440 we add support for detecting changes in the mounted serviceaccount token file (see #12573), and the current change facilitates that effort.