Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[AIRFLOW-3126] Add option to specify additional K8s volumes #8150

Closed

Conversation

brandonwillard
Copy link
Contributor

This PR introduces a new config option, kubernetes.extra_volume_mounts, that allows users to specify multiple Kubernetes volumes to be mounted in each generated worker pod.

This PR replaces #7423 (moved to my personal fork).


Make sure to mark the boxes below before creating PR: [x]


In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.

@boring-cyborg boring-cyborg bot added area:Scheduler including HA (high availability) scheduler k8s labels Apr 5, 2020
@codecov-io
Copy link

codecov-io commented Apr 5, 2020

Codecov Report

Merging #8150 into master will increase coverage by 21.42%.
The diff coverage is 100.00%.

Impacted file tree graph

@@             Coverage Diff             @@
##           master    #8150       +/-   ##
===========================================
+ Coverage   66.39%   87.81%   +21.42%     
===========================================
  Files         935      945       +10     
  Lines       45170    50509     +5339     
===========================================
+ Hits        29990    44357    +14367     
+ Misses      15180     6152     -9028     
Impacted Files Coverage Δ
airflow/executors/kubernetes_executor.py 63.91% <100.00%> (+46.83%) ⬆️
airflow/kubernetes/worker_configuration.py 99.35% <100.00%> (+81.29%) ⬆️
airflow/providers/amazon/aws/sensors/emr_base.py 90.00% <0.00%> (-2.86%) ⬇️
...rflow/providers/amazon/aws/sensors/emr_job_flow.py 94.87% <0.00%> (-1.29%) ⬇️
airflow/hooks/base_hook.py 86.66% <0.00%> (-0.84%) ⬇️
airflow/providers/amazon/aws/hooks/logs.py 95.23% <0.00%> (-0.60%) ⬇️
airflow/providers/grpc/hooks/grpc.py 91.54% <0.00%> (-0.52%) ⬇️
airflow/providers/amazon/aws/hooks/batch_client.py 96.09% <0.00%> (-0.52%) ⬇️
airflow/providers/amazon/aws/hooks/athena.py 64.07% <0.00%> (-0.33%) ⬇️
airflow/utils/trigger_rule.py 100.00% <0.00%> (ø)
... and 560 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 68d1714...8d1cbf4. Read the comment docs.

Copy link
Contributor

@dimberman dimberman left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a question about the user facing UI. Either the documentation needs to change or we should figure out a cleaner way to add those values into the airflow.cfg.

# (required), `mount_path` (required), `read_only` (boolean, default
# null), `sub_path` (default null), `secret_key` (string, default null),
# `secret_mode` (string, default null).
# Example: extra_volume_mounts = "{{{{\"secret_vol\": {{{{\"secret_name\": \"some-secret\", \"mount_path\": \"/dir1\", \"sub_path\": \"subpath1\", \"secret_mode\": \"440\"}}}}, \"pvc\": {{{{\"claim_name\": \"some-pvc\", \"mount_path\": \"/dir2\"}}}}}}}}"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does a user have to do all of these escapes or can they just supply a dict? I'd rather not have people supplying strings this gnarly if possible.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To the user, the value in the .cfg file needs to be valid JSON/a dict, but, yeah, it's very gnarly here.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you want me to add something specific to the documentation before resolving this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kaxil what do you think? should we add this to the airflow_local_settings.py so users can define python dicts?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

airflow_local_settings.py file is intended only for logs currently. I am not sure if we want to mix it.

Copy link
Contributor

@dimberman dimberman Apr 13, 2020

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@mik-laj we use the airflow_local_settings.py for the pod_mutation_hook

def pod_mutation_hook(pod):
That's why I was thinking it might make sense.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The dict in Example looks ugly but that is a separate issue, I would solve it this week so we won't need to define anything in airflow_local_settings.py . extra_volume_mounts feels like a candidate for airflow.cfg similar to kubernetes_secrets etc.

@dimberman It might make sense to convert this extra_volume_mounts to a section similar to kubernetes_secrets, thoughts ?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any updates?

airflow/executors/kubernetes_executor.py Outdated Show resolved Hide resolved
@brandonwillard brandonwillard force-pushed the k8s-extra-mounts branch 3 times, most recently from 8d1cbf4 to 6e985a1 Compare April 10, 2020 00:43
@cccs-pg
Copy link

cccs-pg commented Apr 20, 2020

Great work on this feature, would love to see that merged!

@stale
Copy link

stale bot commented Jun 5, 2020

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

@stale stale bot added the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 5, 2020
@stale stale bot removed the stale Stale PRs per the .github/workflows/stale.yml policy file label Jun 5, 2020
@kaxil kaxil added the pinned Protect from Stalebot auto closing label Jun 14, 2020
@brandonwillard
Copy link
Contributor Author

Here's the CI error; not sure how that could be related to these changes.

@kaxil kaxil requested a review from dimberman June 19, 2020 00:02
@cccs-pg
Copy link

cccs-pg commented Jul 17, 2020

@dimberman did you have a chance to look at this one? Thank you!

@mik-laj
Copy link
Member

mik-laj commented Jul 20, 2020

@cccs-pg I'm not a Kubernetes expert, but can't this option be configured with pod_template_file option in kubernetes section?

- name: pod_template_file

It seems to me that we have a similar problem as in this change.
#7365 (comment)

Could you please give us more context? Why did you decide to implement a new option when you can use the existing one?

@cccs-pg
Copy link

cccs-pg commented Jul 20, 2020

I did not implement this. I was just looking at hopefully using it in an upcoming release... I haven't look at pod_template_file yet, I will check it out now...

@mik-laj
Copy link
Member

mik-laj commented Jul 20, 2020

@brandonwillard Could you please give us more context? Why did you decide to implement this option?

@brandonwillard
Copy link
Contributor Author

@mik-laj, I might've already addressed this in the old PR.

@mik-laj
Copy link
Member

mik-laj commented Jul 23, 2020

I've noticed you've added several PRs recently to add missing functionality to the airflow.cfg for configuring the KubernetesExecutor. There is also a new field which allows you to pass the YAML directly if that helps at all.

@davlum, I've tried it and It did not appear to help—specifically, for default worker pod creation under the K8s Executor. It required too much specificity in the template and seemed to deactivate most other configuration options. Can you give an example of how it can be used to add volumes in exactly this way without sacrificing any of the existing functionality/configurability of worker pod creation?

In my opinion, our mistake was to add a large number of configuration options to Airflow. This makes it very difficult to maintain. I am trying to change the approach and recommend using the pod_template_file option. I know that changing habits can be problematic for many users and there were even command that would facilitate migrations, but it is not finished yet.
#8009
Would you like to help develop this command? In my opinion, this is a better solution and more forward-looking. The use of the Kubernetes API Object means we have better documentation as we can refer to the official documentation and it also makes it easier to use install KubernetesExecution as you can easily copy code snippets from the internet. You always have access to the latest Kubernetes feature too.

If we do not take steps in this direction, we will be sad all the time because we will be missing new Kubernetes features in Airfllow. Project maintainers are also bored as we add/review every new feature the user may need.

Just for a simple git sync we have 17 different configuration options:

  • git_repo
  • git_branch
  • git_sync_depth
  • git_subpath
  • git_sync_rev
  • git_user
  • git_password
  • git_sync_root
  • git_sync_dest
  • git_dags_folder_mount_point
  • git_ssh_key_secret_name
  • git_ssh_known_hosts_configmap_name
  • git_sync_credentials_secret
  • git_sync_container_repository
  • git_sync_container_tag
  • git_sync_init_container_name
  • git_sync_run_as_user

And we don't address all the sync options that users expect with these options e.g. some users need to login to the repository with gcloud
The Kubernetes world is too dynamic to rely on a predefined list of configuration options.

In addition to adding new options, we also have to delete options that have already been dropped by the community.
https://lists.apache.org/thread.html/a68ec0a7e1535ec2011a05cf697a97a6b44345a331fa5d4f721c3391%40%3Cdev.airflow.apache.org%3E
https://cwiki.apache.org/confluence/display/AIRFLOW/Managing+Per-task+Credentials+in+KubernetesExecutor
#6768
This option is a good example of why we shouldn't add new Kubernetes-related options to Airflow.. Before the option was added to Airflow, FengLu sent a few messages on the mailing list, met with other people, prepare PR. After a long time this option didn't even come out of the beta phase, and then it was depreciated. Then another person would have to find the options, check if they were supported, discuss if they could be deleted, then prepare a fix and delete this option.
It's a lot of work to keep up to date with every new idea that hits the Kubernetes world.

On the other hand, we have a solution that is flexible enough for each user to adapt it to their needs.

I hope you will understand the situation we are in now.

Do you have any other idea to solve this problem? I am open to discussion.

@xEviL
Copy link

xEviL commented Jul 23, 2020

@mik-laj I think what you are proposing is a good direction, however pod_template_file at the moment lacks (any) documentation (with regards to use with KubernetesExecutor), this is why people start adding things that are missing.
I now need to do both what is proposed in this PR (mount an extra volume), but also mount a configMap and a secret as a file.
Perhaps if there are examples for these in the doc: on how to use pod_template_file for worker config with KubernetesExecutor - then the feature will be used more and PRs like this will go away.
I'm still looking for how to do this - would appreciate any pointers.

@mik-laj
Copy link
Member

mik-laj commented Jul 23, 2020

@xEviL I think a command that shows the current pod template would be helpful here. One contributor started work on this command but did not finish it. Wouldn't you like to continue this work? I think one command that will create a file that can be used by pod_template_file is all we need.
#8009
Does that sound like a good solution to you?

@mik-laj
Copy link
Member

mik-laj commented Jul 23, 2020

I know some information is missing in the documentation. However, adding more options will not solve this problem. Wouldn't you like to write your documentation expectations in a separate ticket? I found the documentation for KubernetesPodOperator missing and created a ticket. I do not deal with KubernetesExecutor and we find it very difficult to write down the expectations a user may have.
#8970
It was the first step for another person to start working on it.

@fengsi
Copy link

fengsi commented Jul 23, 2020

In my opinion, our mistake was to add a large number of configuration options to Airflow. This makes it very difficult to maintain. I am trying to change the approach and recommend using the pod_template_file option. I know that changing habits can be problematic for many users and there were even command that would facilitate migrations, but it is not finished yet.

I agree that there are too many options. Some of them are conflicting and there is also messy logic validating (but still not complete) them.
There are several issues with the pod_template_file, though:

  1. NO documentation at all, not even a simple example.
  2. Is it a fixed/hard-coded manifest file? If so, I'd think its use cases are very limited.

@brandonwillard
Copy link
Contributor Author

brandonwillard commented Jul 23, 2020

@mik-laj, sorry for the late response, but I don't really have much to add beyond what @fengsi and @xEviL have already said.

It took far too much context-specific setup, effort, and time just to debug errors caused by—what I would consider—natural assumptions about the functionality of the pod_template_file feature, and I can't see myself doing that again, especially not for documentation/example purposes. That documentation and examples work needs to be done upfront by the original developers (or whoever knows this feature best) in a way the reflects its intended use and functionality. In other words, asking the users to write the documentation—albeit somewhat indirectly—is a difficult approach in this case.

Plus, my guess is that this templating approach will ultimately lead to the need for more robust templating features (e.g. such as those offered by Jinja and the like), and some important questions and goals surrounding that should be addressed first.

@xEviL
Copy link

xEviL commented Jul 24, 2020

@brandonwillard @fengsi @mik-laj

I did some experiments yesterday and I was able to define an extra volume and volumeMount via pod_template_file option. What I did is:

  • run a long running task (sleep 3600 in a BashOperator) in a KubernetesExecutor with normal config via [kubernetes] section options
  • dump YAML from k8s pod via kubectl get pod ... -o yaml
  • clean it up (remove obviously "wrong" sections to have: status, metadata.labels, command, args)
  • add my volume (from secret in my case) and volumeMount to the relevant sections
  • make this my_worker_template.yaml file available to Airflow scheduler and webserver (in my case: put it into a configmap and mount it into the scheduler pod via its manifest - we are not using the helm chart)
  • specify the path via pod_template_file in airflow.cfg
  • restart the services

From what I understood from the source code and the behaviour of this:

  • YAML from pod_template_file is loaded into PodGenerator instead of other options and is used as a "pod prototype"
  • Then it still overrides/initialises varying parameters needed for tasks (like some Airflow config options passed via env, also command and args, and metadata.labels)

Hope this helps!

@fengsi
Copy link

fengsi commented Jul 24, 2020

Thanks for testing it, @xEviL!

The global pod_template_file setting will likely affect KubernetesPodOperator too, and that's not a good thing – I'm using both KubernetesExecutor and custom KubernetesPodOperator(s).

Logically, KubernetesExecutor is still more on Airflow side, but I don't think it makes sense to have the same K8s settings globally applied to all K8s-related operators, as they have nothing to do with executors executing them, and in most cases they are totally independent.

I'm gonna test if global pod_template_file setting would ruin KubernetesPodOperator or not. Hopefully not. 🤞

@fengsi
Copy link

fengsi commented Jul 24, 2020

And I feel the larger question would be: is it possible to come up with an elegant solution such that Airflow could "bootstrap" itself in K8s world? If we take a look at Airflow in K8s, its manifests are usually handled by other tools (e.g., Helm). Now both KubernetesExecutor and KubernetesPodOperator do similar things but rely on Airflow (K8s Python client), causing inconsistency between ecosystems.

My current set up:

  1. Airflow scheduler & web servers: managed by Helm, "fixed"
  2. Airflow workers: managed by KubernetesExecutor through tons of AIRFLOW__KUBERNETES__* settings but still incomplete and require hacks (like the new pod_template_file)
  3. Airflow operators: executed by KubernetesExecutor, some being KubernetesPodOperator that may require totally different settings, but are partially contaminated by KubernetesExecutor settings

Guess what I'm trying to say is K8s support in Airflow could be improved for sure. I understand Airflow was not designed for K8s since day one, but it'll be great if there is a way to consolidate these use cases.

@xEviL
Copy link

xEviL commented Jul 24, 2020

I'm gonna test if global pod_template_file setting would ruin KubernetesPodOperator or not. Hopefully not

KubernetesPodOperator should not be affected by Airflow configuration, have a look at:
https://github.com/apache/airflow/blob/master/airflow/providers/cncf/kubernetes/operators/kubernetes_pod.py#L365
and

pod_template_file: Optional[str] = None,

Airflow settings are not imported anywhere, so only constructor arguments are relevant (for the operator).

In our test setup we also ran KubernetesPodOperator which we already use since a few months, but now also from KubernetesExecutor - it works!

@mik-laj
Copy link
Member

mik-laj commented Jul 24, 2020

@xEviL I confirm that pod_template_file option does not affect kubernetesPodOperaator. It's only for KubernetesExecutor.. pod_mutation_hook is global and is used by both the operator and the executor.

Wouldn't you like to write some documentation that explains how you set up Kubernetes executor? I know our documentation for an executor is incomplete, but even writing one section is a step forward. I will try to motivate other people to complete documentation for other sections.

Is it a fixed/hard-coded manifest file? If so, I'd think its use cases are very limited.

@fengsi In most cases, a static configuration is sufficient. If you want more dynamic configuration, you can extend this with the pod mutation hook or executor_config. If you want to base your logic based on task attributes, you can modify executor_config with the cluster policy.

If we take a look at Airflow in K8s, its manifests are usually handled by other tools (e.g., Helm). Now both KubernetesExecutor and KubernetesPodOperator do similar things but rely on Airflow (K8s Python client), causing inconsistency between ecosystems.

Yes. I realize this is not natural, but I think the YAML file brings us closer to the expected result. In the future, we might think about creating a CRD to make setup easier, but the CRD still contained a YAML file with a Pod definition. If we don't have to maintain a lot of configuration options and the code is less complex, I suspect it will happen soon.

@mik-laj
Copy link
Member

mik-laj commented Jul 24, 2020

@fengsi I created a ticket about worker template as a Kubernetes Resource. #9981

@fengsi
Copy link

fengsi commented Jul 27, 2020

I followed the dumping YAML hack @xEviL mentioned and it worked. The Helm template (based on stable/airflow chart) I ended up using is:

apiVersion: v1
kind: ConfigMap
metadata:
  name: {{ include "airflow.fullname" . }}-files
  labels:
    app: {{ include "airflow.labels.app" . }}
    chart: {{ include "airflow.labels.chart" . }}
    release: {{ .Release.Name }}
    heritage: {{ .Release.Service }}
data:
  kubernetes-executor-template.yaml: |
    apiVersion: v1
    kind: Pod
    metadata:
      name: {{ include "airflow.fullname" . }}-worker
    spec:
      containers:
        - name: {{ .Chart.Name }}-worker
          image: {{ .Values.airflow.airflow.image.repository }}:{{ .Values.airflow.airflow.image.tag }}
          imagePullPolicy: {{ .Values.airflow.airflow.image.pullPolicy}}
          envFrom:
            - configMapRef:
                name: "{{ include "airflow.fullname" . }}-env"
          env:
            - name: AIRFLOW__CORE__EXECUTOR
              value: LocalExecutor
            {{- if .Values.airflow.airflow.extraEnv }}
            {{- toYaml .Values.airflow.airflow.extraEnv | nindent 12 }}
            {{- end }}
          {{- if .Values.airflow.airflow.extraVolumeMounts }}
          volumeMounts:
          {{- toYaml .Values.airflow.airflow.extraVolumeMounts | nindent 12 }}
          {{- end }}
      {{- if .Values.airflow.airflow.extraVolumes }}
      volumes:
      {{- toYaml .Values.airflow.airflow.extraVolumes | nindent 8 }}
      {{- end }}

The .Values.airflow.airflow not .Values.airflow is because I used stable/airflow as a dependency of my own chart.
The catches here are:

  1. I needed to specify both metadata.name and spec.containers[*].name. Though they will be overridden by other logic later, I cannot create the "reference Pod" with them omitted;
  2. I needed to hard-code AIRFLOW__CORE__EXECUTOR=LocalExecutor in the template for the KubernetesExecutor Pod, otherwise it will pick AIRFLOW__CORE__EXECUTOR=KubernetesExecutor from global settings and fail – that's what I mean by "partially contaminated by KubernetesExecutor settings";
  3. As you can see, I also let it pick up extraEnv, extraVolumeMounts and extraVolumes from global settings. This allows mounting correct stuff.

This approach I took was partially due to the way the stable/airflow chart is implemented. I had to avoid defining certain sections and use extra* instead in order to make the chart and this template both working, which means besides Airflow code, I needed to read through stable/airflow code and become an expert too. 🤣

What I learned about pod_template_file so far:

Well, it worked, at least.

BUT, unfortunately, it's more of a hack than a solution. It's like reverse engineering a scratch Pod file: create an object based on the template (some parameters are expected but are really just placeholders), hack on the fly in Airflow KubernetesExecutor logic to make the real Pod object, and convert it back to config with modified values.

The problems I can see:

  1. There is really no way for a user to know exactly what the final outcome would be, unless diving into every single line of the source code and hack the hell out of if;
  2. This "two-way" binding approach is too dynamic, sometimes scary;
  3. With pod_template_file, tons of AIRFLOW__KUBERNETES__* settings are now split into two groups: one for those "meant for scheduler when launching Pods but don't really apply to any individual Pod", and one for those that "should be passed down to Pods individually but now are ignored". I don't see a clear boundary between these two sets, and of course for the latter one, a fixed template cannot do much. It may still require more hacks like the pod_mutation_hook or executor_config mentioned by @mik-laj. Guess this is an example of the "two-way" binding pattern we should probably avoid in the future.

As for the right direction of next steps, my use case is somewhat covered by above hacks, but I think rather than rushing into any particular approach now, we may still need more time gathering K8s use cases from more users before making a decision (plus the K8s ecosystem is evolving fast too).

And it may make more sense to start with a new executor when the right time comes, if we were lucky to find an elegant brand new solution. We can then keep current KubernetesExecutor as is and don't try to make it more complicated than it should have been.

Just my 2 cents. 🙂

@dimberman
Copy link
Contributor

dimberman commented Jul 31, 2020

@fengsi Thank you for that breakdown, there's a lot of really valuable feedback there.

One thing I've been pushing has been the CeleryExecutor with the KEDA autoscaler. With this set-up you can define your celery workers as deployments and since they all autoscale to 0, you can create as many "queues" as you'd like.

I'm sad to hear the pod_template_file still leads to confusion. I'd love to see if we can figure out a cleaner interface for it s.t. it's more transparent.

Ideally for Airflow 2.0 I'd like to make the pod_template_file the ONLY option, and remove all other configs. We can offer a "default" value and users can modify as needed.

@brandonwillard
Copy link
Contributor Author

@mik-laj @dimberman, you folks made good points, and I agree that this type of configuration approach isn't the right direction—at least not without some yet unstated reason. I would've closed this PR a while ago if it weren't for the ongoing—and quite elucidating—discussion (thanks to @xEviL and @fengsi), but, now that other, more relevant issues are being referred/created, I'm going to take the opportunity to close this down.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:Scheduler including HA (high availability) scheduler pinned Protect from Stalebot auto closing
Projects
None yet
Development

Successfully merging this pull request may close these issues.

8 participants