-
Notifications
You must be signed in to change notification settings - Fork 14.6k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[AIRFLOW-3126] Add option to specify additional K8s volumes #8150
Conversation
Codecov Report
@@ Coverage Diff @@
## master #8150 +/- ##
===========================================
+ Coverage 66.39% 87.81% +21.42%
===========================================
Files 935 945 +10
Lines 45170 50509 +5339
===========================================
+ Hits 29990 44357 +14367
+ Misses 15180 6152 -9028
Continue to review full report at Codecov.
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Just a question about the user facing UI. Either the documentation needs to change or we should figure out a cleaner way to add those values into the airflow.cfg.
# (required), `mount_path` (required), `read_only` (boolean, default | ||
# null), `sub_path` (default null), `secret_key` (string, default null), | ||
# `secret_mode` (string, default null). | ||
# Example: extra_volume_mounts = "{{{{\"secret_vol\": {{{{\"secret_name\": \"some-secret\", \"mount_path\": \"/dir1\", \"sub_path\": \"subpath1\", \"secret_mode\": \"440\"}}}}, \"pvc\": {{{{\"claim_name\": \"some-pvc\", \"mount_path\": \"/dir2\"}}}}}}}}" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does a user have to do all of these escapes or can they just supply a dict? I'd rather not have people supplying strings this gnarly if possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To the user, the value in the .cfg
file needs to be valid JSON/a dict
, but, yeah, it's very gnarly here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you want me to add something specific to the documentation before resolving this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@kaxil what do you think? should we add this to the airflow_local_settings.py so users can define python dicts?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
airflow_local_settings.py file is intended only for logs currently. I am not sure if we want to mix it.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dict in Example looks ugly but that is a separate issue, I would solve it this week so we won't need to define anything in airflow_local_settings.py . extra_volume_mounts
feels like a candidate for airflow.cfg similar to kubernetes_secrets etc.
@dimberman It might make sense to convert this extra_volume_mounts
to a section similar to kubernetes_secrets, thoughts ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any updates?
8d1cbf4
to
6e985a1
Compare
Great work on this feature, would love to see that merged! |
This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions. |
6e985a1
to
48d7c5a
Compare
48d7c5a
to
37fa206
Compare
Here's the CI error; not sure how that could be related to these changes. |
@dimberman did you have a chance to look at this one? Thank you! |
@cccs-pg I'm not a Kubernetes expert, but can't this option be configured with airflow/airflow/config_templates/config.yml Line 1750 in 37fa206
It seems to me that we have a similar problem as in this change. #7365 (comment) Could you please give us more context? Why did you decide to implement a new option when you can use the existing one? |
I did not implement this. I was just looking at hopefully using it in an upcoming release... I haven't look at pod_template_file yet, I will check it out now... |
@brandonwillard Could you please give us more context? Why did you decide to implement this option? |
In my opinion, our mistake was to add a large number of configuration options to Airflow. This makes it very difficult to maintain. I am trying to change the approach and recommend using the If we do not take steps in this direction, we will be sad all the time because we will be missing new Kubernetes features in Airfllow. Project maintainers are also bored as we add/review every new feature the user may need. Just for a simple git sync we have 17 different configuration options:
And we don't address all the sync options that users expect with these options e.g. some users need to login to the repository with gcloud In addition to adding new options, we also have to delete options that have already been dropped by the community. On the other hand, we have a solution that is flexible enough for each user to adapt it to their needs. I hope you will understand the situation we are in now. Do you have any other idea to solve this problem? I am open to discussion. |
@mik-laj I think what you are proposing is a good direction, however |
@xEviL I think a command that shows the current pod template would be helpful here. One contributor started work on this command but did not finish it. Wouldn't you like to continue this work? I think one command that will create a file that can be used by |
I know some information is missing in the documentation. However, adding more options will not solve this problem. Wouldn't you like to write your documentation expectations in a separate ticket? I found the documentation for KubernetesPodOperator missing and created a ticket. I do not deal with KubernetesExecutor and we find it very difficult to write down the expectations a user may have. |
I agree that there are too many options. Some of them are conflicting and there is also messy logic validating (but still not complete) them.
|
@mik-laj, sorry for the late response, but I don't really have much to add beyond what @fengsi and @xEviL have already said. It took far too much context-specific setup, effort, and time just to debug errors caused by—what I would consider—natural assumptions about the functionality of the Plus, my guess is that this templating approach will ultimately lead to the need for more robust templating features (e.g. such as those offered by Jinja and the like), and some important questions and goals surrounding that should be addressed first. |
@brandonwillard @fengsi @mik-laj I did some experiments yesterday and I was able to define an extra
From what I understood from the source code and the behaviour of this:
Hope this helps! |
Thanks for testing it, @xEviL! The global Logically, I'm gonna test if global |
And I feel the larger question would be: is it possible to come up with an elegant solution such that Airflow could "bootstrap" itself in K8s world? If we take a look at Airflow in K8s, its manifests are usually handled by other tools (e.g., Helm). Now both My current set up:
Guess what I'm trying to say is K8s support in Airflow could be improved for sure. I understand Airflow was not designed for K8s since day one, but it'll be great if there is a way to consolidate these use cases. |
airflow/airflow/kubernetes/pod_generator.py Line 190 in 458d7d4
Airflow settings are not imported anywhere, so only constructor arguments are relevant (for the operator).
In our test setup we also ran |
@xEviL I confirm that pod_template_file option does not affect kubernetesPodOperaator. It's only for KubernetesExecutor.. pod_mutation_hook is global and is used by both the operator and the executor. Wouldn't you like to write some documentation that explains how you set up Kubernetes executor? I know our documentation for an executor is incomplete, but even writing one section is a step forward. I will try to motivate other people to complete documentation for other sections.
@fengsi In most cases, a static configuration is sufficient. If you want more dynamic configuration, you can extend this with the pod mutation hook or executor_config. If you want to base your logic based on task attributes, you can modify executor_config with the cluster policy.
Yes. I realize this is not natural, but I think the YAML file brings us closer to the expected result. In the future, we might think about creating a CRD to make setup easier, but the CRD still contained a YAML file with a Pod definition. If we don't have to maintain a lot of configuration options and the code is less complex, I suspect it will happen soon. |
I followed the dumping YAML hack @xEviL mentioned and it worked. The Helm template (based on
The
This approach I took was partially due to the way the What I learned about
|
@fengsi Thank you for that breakdown, there's a lot of really valuable feedback there. One thing I've been pushing has been the CeleryExecutor with the KEDA autoscaler. With this set-up you can define your celery workers as deployments and since they all autoscale to 0, you can create as many "queues" as you'd like. I'm sad to hear the pod_template_file still leads to confusion. I'd love to see if we can figure out a cleaner interface for it s.t. it's more transparent. Ideally for Airflow 2.0 I'd like to make the pod_template_file the ONLY option, and remove all other configs. We can offer a "default" value and users can modify as needed. |
@mik-laj @dimberman, you folks made good points, and I agree that this type of configuration approach isn't the right direction—at least not without some yet unstated reason. I would've closed this PR a while ago if it weren't for the ongoing—and quite elucidating—discussion (thanks to @xEviL and @fengsi), but, now that other, more relevant issues are being referred/created, I'm going to take the opportunity to close this down. |
This PR introduces a new config option,
kubernetes.extra_volume_mounts
, that allows users to specify multiple Kubernetes volumes to be mounted in each generated worker pod.This PR replaces #7423 (moved to my personal fork).
Make sure to mark the boxes below before creating PR: [x]
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.
Read the Pull Request Guidelines for more information.