Skip to content

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Has anyone been able to use this chart when running in a cluster with Istio? #510

Closed
2 tasks done
aodj opened this issue Jan 27, 2022 · 4 comments
Closed
2 tasks done
Labels
kind/question kind - user questions

Comments

@aodj
Copy link
Contributor

aodj commented Jan 27, 2022

Checks

Question

I'm trying to make use of the chart in a cluster that uses Istio. Due to the initContainer design of the jobs (dbMigrations, sync, etc) it doesn't look like it would work since the initContainers all attempt to connect to the database prior to the istio-proxy being up and running.

Has anyone managed to do it with this chart or will I have to allow Airflow to run in PERMISSIVE mutual-TLS to work around this?

@aodj aodj added the kind/question kind - user questions label Jan 27, 2022
@thesuperzapper
Copy link
Member

Related PR: #493

@thesuperzapper
Copy link
Member

@aodj It's not clear why you would want to run airflow within an istio mesh, as airflow does not require istio to secure communications between its components.

Can you explain what you are trying to achieve with an istio integration?


However, I think the chart should be usable in STRICT mutual TLS mode given a LOT of work, assuming you:

  1. use an externalDatabase.* (not inside the istio mesh) to ensure the init-containers start successfully
  2. annotate the Pods with proxy.istio.io/config: '{ "holdApplicationUntilProxyStarts": true }' to ensure the non-init containers start AFTER the istio-proxy is ready
  3. either configure the embedded redis.* to be in the mesh, or use an externalRedis.*
  4. don't REQUIRE all traffic to use istio egress gateways, as the init-containers will naturally be outside the mesh

NOTE: all the init-containers require network access (git-clone, check-db, install-pip-package, wait-for-db-migrations), and so this traffic will be OUTSIDE the istio mesh, meaning things like egress gateways will not be used, which may or may not cause issues, depending on how strict your cluster setup is.


Istio is an incredibly complex and cumbersome system (trust me, I work on Kubeflow), the variation of how istio can be configured, along with incomprehensible changes between versions, makes it very painful to integrate with apps that are not designed for it.

My recommendation is to not attempt to run airflow within an isitio mesh unless you absolutely have to.

@aodj
Copy link
Contributor Author

aodj commented Feb 18, 2022

The cluster that I'm running Airflow within has Istio configured to run with mtls in STRICT mode. Additionally, a number of the tasks that Airflow runs interact with services that are within the mesh requiring mtls connections. As such it's easier to try and work with the mesh rather than against it.

When I was configuring the chart I didn't run into any specific issues that required the holdApplicationUntilProxyStarts annotation; the git-sync and db-check initContainers required some work to get running through, and I played with a mix of PERMISSIVE mtls for a while, but settled on excluding the ports from the mesh, whilst waiting for an answer here.

The one piece I've not yet get worked out is allowing the service to be accessible via a VirtualService; this was the reason why I commented on #493 asking when it might be merged.

@thesuperzapper
Copy link
Member

The cluster that I'm running Airflow within has Istio configured to run with mtls in STRICT mode. Additionally, a number of the tasks that Airflow runs interact with services that are within the mesh requiring mtls connections. As such it's easier to try and work with the mesh rather than against it.

@aodj airflow only connects to its own services (and the external database/redis), I still don't see the need to include it in your istio mesh. With the possibly exception of exposing the webserver/flower web interfaces with an istio VirtualService (but this only requires that traffic to be inside the mesh, not all other airflow traffic).

When I was configuring the chart I didn't run into any specific issues that required the holdApplicationUntilProxyStarts annotation; the git-sync and db-check initContainers required some work to get running through, and I played with a mix of PERMISSIVE mtls for a while, but settled on excluding the ports from the mesh, whilst waiting for an answer here.

Be careful that you don't change MTLS mode after already deploying, as you may only encounter errors once pods restart, and the init-containers run again.

The one piece I've not yet get worked out is allowing the service to be accessible via a VirtualService; this was the reason why I commented on #493 asking when it might be merged.

I understand that istio infers protocol types from port names, so in the case of names like web it should just assume plain TCP traffic. This is suboptimal for istio VirtualServices, as this will prevent your spec.http.* configs from taking effect, instead you would have to use spec.tcp.*.

Rather than allowing users to rename ports (as proposed in #493), and potentially specify the wrong protocol, we can automatically specify the protocol field of ContainerPort v1 (added in K8S 1.18), by detecting if AIRFLOW__WEBSERVER__WEB_SERVER_SSL_CERT and AIRFLOW__WEBSERVER__WEB_SERVER_SSL_KEY are set with detection similar to our define "airflow.web.scheme", which is used in the webserver probe.

NOTE: I now realize we ALSO need to update our define "airflow.web.scheme" to also check inside airflow.extraEnv for these environment variables (which would likely be set from Secrets). Luckily, our NOTES.txt has an example for checking both airflow.config and airflow.extraEnv at the same time.

This issue was moved to a discussion.

You can continue the conversation there. Go to discussion →

Labels
kind/question kind - user questions
Projects
None yet
Development

No branches or pull requests

2 participants