Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Handling exit of kerberos sidecar container for k8s < 1.28 #35748

Closed
wants to merge 7 commits into from

Conversation

amoghrajesh
Copy link
Contributor

@amoghrajesh amoghrajesh commented Nov 20, 2023

As a part of the effort to enable kerberos to work e2e with airflow in a kubernetes deployment, some changes related to the pod template and worker deployment have gone in recently in PR: #35146 and #35548.

However, one problem remains. Which is the airflow sidecar container not exiting gracefully even after the "base" container completes or exits.

To get this working, I was able to do a POC by making airflow kerberos container and airflow base container talk using shareProcessNamespace and then using "args" field of airflow kerberos sidecar to watch the base container periodically and kill the process of itself if airflow base container has exited. This doesn't add a lot onto the performance side as well because the command run by airflow kerberos will essentially be a finding a pid in /proc/*/.

This solution has been tested out on a secure environment like Redhat Openshift in conjunction to on prem kubernetes distribution.

EDIT:
Some approaches were discussed in this PR. The consensus is to handle using combination of EXIT file for k8s version < 1.28 and using native sidecar containers for 1.28+


^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

@potiuk
Copy link
Member

potiuk commented Nov 20, 2023

Shared process namespace in this case kind of violates the whole assumption the separate kerberos container here is introduced for. The whole idea is that "airflow" containers do not see the kerberors / keytab used to refresh the short time ti ket. Only kerberos container/refreshing process should ever be able to get access to keytab.

Keytab are long-living and provide full access to kerberos service. They use symmetric encryption and once you get hold of it, you are able to communicate with Kerberos server and obtain the short living ticket to do the job you are supposed to do - so it should be very strongly guarded property.

Generally Airflow components should never be able to see or obtain keytab files (only airflow kerberos refreshing container should have access to it) - instead all the components should only access the short-living ticket. This is the basic assumption that the whole separtion of the containers is based on - the two containers share only filesystem where the ticket is refreshed and nothing else.

See https://www.fortinet.com/resources/cyberglossary/kerberos-authentication for example.

If airflow components (specifically worker) will get access to keytab, someone could write a DAG to - for example - send the keytab to a remote system and once this happens and such keytab can be used by anyone to do anything. When you do the same with short living ticket, you are limited to only what the service allows the ticket to do + it's time limited so exposure of such breach is potentially much more limited.

Of course, ysing SharedProcessNamespace is not explicitly violating this assumption. It does not give the airflow component direct access to the keytab. So far so good. But it opens a way to other ways of obtaining the keytab by malicious actors. For example, you could connect to the kerberos process with gdb and dump the memory of it and retrieve the keytab from memory of that process. And this is only one of the ways you could retrieve that information, there are many others. Some of them you might protect better from, but generally speaking, it significantly decreases the isolation that container mechanism introduces (and the reason why the two are run in separate containers).

So while this solution is not directly giving access to the keytab, it does decrease isolation and introduces security risks. I am quite sure when we introduce it, this will be flagged as a security issue and we will have to fix it.

I think there are two options:

  1. we should figure out a different way how to communicate with the kerberos refreshing process and stop it - one of the options is to let the running process be stoppable by sending a "shutdown" message via TCP connection - and expose the sidecar container's port to the main containers. Another option (probably simpler to implement) is to keep some kind of "shutdown" lock file in the same filesystem where the ticket is stored and signal the rerfreshing process that it should exit. It could be based on lock mechanism available natively in python.

  2. If we limit the solution to only kubernetes 1.28+, we have the option of using Native Sidecar Containers and marking the kerberos sidecar as such https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Likely the best is combination of these 1) (for k8s < 1.28) and 2) (for k8s >= 1.28).

@amoghrajesh
Copy link
Contributor Author

amoghrajesh commented Nov 20, 2023

Thank you for the detailed review @potiuk. This is extremely helpful to me.

Another option (probably simpler to implement) is to keep some kind of "shutdown" lock file in the same filesystem where the ticket is stored and signal the rerfreshing process that it should exit. It could be based on lock mechanism available natively in python.

I like this idea. It is simple yet effective. I would like to explore more along these lines and get back in case of any doubts.

If we limit the solution to only kubernetes 1.28+, we have the option of using Native Sidecar Containers and marking the kerberos sidecar as such https://kubernetes.io/blog/2023/08/25/native-sidecar-containers/

Came across this too. It is not recommended as of now since it is in its alpha stage. However, few questions here:

  1. If 1) works for old and new k8s, no need to have 2) at all?
  2. Do we have any utils to help us identify k8s version at the time of helm install itself?

@amoghrajesh
Copy link
Contributor Author

amoghrajesh commented Nov 20, 2023

we should figure out a different way how to communicate with the kerberos refreshing process and stop it - one of the options is to let the running process be stoppable by sending a "shutdown" message via TCP connection - and expose the sidecar container's port to the main containers

This is not a bad option either. The main container could send across a kill signal through TCP as part of its postRun hook or so. That way, it would send the signal and exit.

This solution is briefly discussed here: https://medium.com/apache-airflow/enable-kerberos-with-airflow-kubernetesexecutor-6e86621e97a5

I would like to explore the better one among the two and come up with the proposal soon :D

@potiuk
Copy link
Member

potiuk commented Nov 20, 2023

I think we have ways to generate different resources for 1.28+ . I believe helm takes K8s version into account when generating resources. Look in our chart I think we have a few cases of that

I also think native sidecar is the long term way to go. It's been long time in the making - KEP for it has been discussed for 3 years at least - and it handles many edge cases and hides a lot of complexity under the hood of simple label.

Also the 1) case is complex to develop and has limitations - need to open TCP/IP port or having write access from airflow components to the shared filesystem also impact security properties of this setup. Currently you just need read access from worker side to the filesystem abnd you do not need TCP/iP connection - so both proposals are also weakening the isolation (though not as much as shared process does). So I'd say we should treat native side-containers as THE solution and the others are merely workarounds for versions of kubernwtes that do not have native side containers yet.

@amoghrajesh
Copy link
Contributor Author

I also think native sidecar is the long term way to go. It's been long time in the making - KEP for it has been discussed for 3 years at least - and it handles many edge cases and hides a lot of complexity under the hood of simple label.

Thanks for letting me know about this. Was not very aware of this. If this is the case, we should consider this as "THE ONLY" solution along with lock file as you mentioned for earlier versions

@amoghrajesh
Copy link
Contributor Author

So I'd say we should treat native side-containers as THE solution and the others are merely workarounds for versions of kubernwtes that do not have native side containers yet.

Right. This would be a good way forward & we should look forward at going this route. I am going to try this out by deploying a 1.28 k8s cluster, hopefully things will be good there 👍🏽

@potiuk
Copy link
Member

potiuk commented Nov 21, 2023

Thanks for letting me know about this. Was not very aware of this. If this is the case, we should consider this as "THE ONLY" solution along with lock file as you mentioned for earlier versions

You can read about it for example here: https://buoyant.io/blog/kubernetes-1-28-revenge-of-the-sidecars it explains the history (started 2019 so I was not too far off with 3 years).

@amoghrajesh amoghrajesh marked this pull request as draft November 21, 2023 15:58
@amoghrajesh
Copy link
Contributor Author

I think we have ways to generate different resources for 1.28+ . I believe helm takes K8s version into account when generating resources. Look in our chart I think we have a few cases of that

@potiuk I did check for this. Unfortunately, we do not seem to have such a mechanism. We will eventually need it to get the 1.28 features and this is also useful in a long run.

I was thinking of introducing an env variable at the worker deployment level where the main container has the env that holds the kube version. We can conditionally set the args for containers and do a lot more things too.

For example:

      containers:
      - name: base
        env:
        - name: KUBE_VERSION
          valueFrom:
            configMapKeyRef:
              name: your-scripts
              key: kubeVersionScript.sh
        volumeMounts:
        - name: script-volume
          mountPath: /scripts
        args:
{{- if eq .Env.KUBE_VERSION "1.25" }}
        - /bin/sh
        - -c
        - |
          echo "Kube version is 1.25, do something"
{{- else if eq .Env.KUBE_VERSION "1.26" }}
        - /bin/sh
        - -c
        - |
          echo "Kube version is 1.26, do something else"
{{- else }}
        - /bin/sh
        - -c
        - |
          echo "Kube version is not 1.25 or 1.26, do default"
{{- end }}
      initContainers:
      - name: init-script
        image: your-image
        command: ["/bin/sh", "-c"]
        args:
        - |
          export KUBE_VERSION=$(source /scripts/kubeVersionScript.sh)
        volumeMounts:
        - name: script-volume
          mountPath: /scripts
      volumes:
      - name: script-volume
        configMap:
          name: your-scripts

Thoughts?
cc @jedcunningham looping you in. Pls correct me if I am misinformed here

@pankajkoti
Copy link
Member

@amoghrajesh just if you wonder if Jed is jot able to respond anytime soon, Jed is on parental leave 👶

@amoghrajesh amoghrajesh marked this pull request as ready for review November 28, 2023 06:17
@amoghrajesh amoghrajesh changed the title Connecting airflow containers with shareProcessNamespace Handling exit of kerberos sidecar container for k8s < 1.28 Nov 28, 2023
@amoghrajesh
Copy link
Contributor Author

@potiuk re worked the PR to support it in a better way. Can you take a look when you have some time?

@@ -256,6 +257,10 @@ spec:
- name: worker-logs
containerPort: {{ .Values.ports.workerLogs }}
volumeMounts:
{{- if and (semverCompare ">=2.8.0" .Values.airflowVersion) .Values.workers.kerberosSidecar.enabled (semverCompare "<1.28" $kubeVersion) }}
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks like 2.8 will change now, no? I see the RC out for 2.8 already.
cc @potiuk

@amoghrajesh
Copy link
Contributor Author

Pending:

  1. Unit tests for the changes
  2. Regressive testing

@amoghrajesh
Copy link
Contributor Author

Ok seems like the issue in CI was intermittent. It is green now :D

@amoghrajesh amoghrajesh removed the request for review from ashb November 28, 2023 07:51
@pankajkoti pankajkoti removed their request for review December 3, 2023 07:53
@amoghrajesh
Copy link
Contributor Author

Yes. No need to Add 2.8.0 in this case as we are not using any new feature added in kerberos command as I understand ?

You are right. No new feature for airflow kerberos. Removing the 2.8.0 check

chart/templates/_helpers.yaml Outdated Show resolved Hide resolved
chart/templates/workers/worker-deployment.yaml Outdated Show resolved Hide resolved
chart/templates/workers/worker-deployment.yaml Outdated Show resolved Hide resolved
@amoghrajesh
Copy link
Contributor Author

@potiuk I think some tests fail in the CI. Working on fixing those, but the PR is up to date now. Could you take a look when you have some time?

{{- if .Values.workers.args }}
{{- if and .Values.workers.kerberosSidecar.enabled (semverCompare "<1.28" $kubeVersion) }}
args:
{{ tpl (toYaml .Values.workers.args) . | nindent 12 }} &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this will not work as expected for two reasons:

  1. what you can see in the test - exec airflow celery worker will replace the bash interpreter with the python one (this is how exec works) - this means that whatever is after && will not be executed because bash interpreter will not be there any more to execute it

  2. Even if it did work and we had no exect, this would not work if the worker crashed and exited with error - because in this case what's after && is not executed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for that @potiuk
Looking for alternatives, any ideas on this front?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tried something like this:

          {{- if and .Values.workers.kerberosSidecar.enabled (semverCompare "<1.28" $kubeVersion) }}
          args:
            - {{ tpl (toYaml .Values.workers.args) . | nindent 12 }}
            - "&&"
            - "sh"
            - "-c"
            - >
              echo "EXIT file content" > /opt/kerberos-exit/EXIT
          {{- else if .Values.workers.args }}
          args: {{ tpl (toYaml .Values.workers.args) . | nindent 12 }}
          {{- end }}

This evaluates to:

          args:
            - 
            - bash
            - -c
            - |-
              exec \
              airflow celery worker
            - "&&"
            - "sh"
            - "-c"
            - >
              echo "EXIT file content" > /opt/kerberos-exit/EXIT

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

And adding a trim in helm makes it like this:

          args:
            - - bash
            - -c
            - |-
              exec \
              airflow celery worker
            - "&&"
            - "sh"
            - "-c"
            - >
              echo "EXIT file content" > /opt/kerberos-exit/EXIT

Thinking of some alternatives :)

@amoghrajesh
Copy link
Contributor Author

@potiuk I made a change and I think it will work because:
In this example, the && is part of the command inside the shell script. This way, the entire script is executed, including the command after &&.

@amoghrajesh
Copy link
Contributor Author

@potiuk made some changes to the PR. Can you take a look when you have some time?

- bash
- -c
- |-
exec airflow celery worker &&
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I do not think it's going to work either. The exec will replace bash interpreter at the moment airflow celery worker executes and what is after && will never get executed.

You can check it by just

bash -c "exec ls && echo TEST"

It will only list current directory but it will not echo TEST.

You just can't exec when you are running bash-interpreted command as at the moment of exec, the process that you run replaces bash interpreter.

What you really need to do here if you want to use shell, you need to run the worker without exec, but you also need to make sure signal propagation happens - so that the parent bash process wil propagate the signals. However maybe you do not need to do it when you are using our image - because we are using

dumb_init as entrypoint with DUMB_INIT_SETSID set to 1 which means that dumb-init will propagate signals to ALL the children of bash, not only to bash, in which case

bash -c "airflow celery worker && sh -c "echo 'EXIT file content' > /opt/kerberos-exit/EXIT"

as command might just work - but you need to test it. run it in kubernetes, find the process of worker, send a signal (say TERM) and see if it completes and if it correctly writes the EXIT file.
https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the comment. Looking into what is the best way to do this

Copy link

github-actions bot commented Mar 1, 2024

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Mar 1, 2024
@github-actions github-actions bot closed this Mar 7, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:helm-chart Airflow Helm Chart stale Stale PRs per the .github/workflows/stale.yml policy file
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants