-
Notifications
You must be signed in to change notification settings - Fork 14.4k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add a way to skip the secret_backend #19251
Comments
Yep. I also thought this might be a good idea. I often found that things would be simpler this way, and there is a huge potential of optimizing the traffic to secret backends. I tihnk the ratio of "secret" variables vs. "non-secret ones" are like 1-100 and seems that we have to pay the price of roundtrip to secret backends every time we access it. Of course, we then have to go to DB instead, but if the secret backend does not contain the variable we do it anyway, so we can save a lot by adding a context "this variable is not supposed to be looked up in context" And yeah - cost for accessing secrets also matters. @kaxil - WDYT ? I think you were the master-mind behind the secrets implementation, do you see any obvious problems with this approach ? I cannot see any "bad scenario" here. |
Yeah I was thinking of this sort of feature, I think I described in one of the issues too. Instead of cc @fhoda |
I implemented the skip option in the PR , I could also implement the select option But what should be the augment string give by the user ? The name of the class of the secret_back to only look to? Like Variable.get("toto",look_in="MetastoreBackend") |
I personally think I think the most "common" use cases are where secrets are kept in "secret" backend and all other variables are kept in metastore, so having just "skip_external_backend" makes sense (however I'd rename the parameter to 'skip_secret_backend` to be more precise. |
ok , current implementation is only with the (but never skipping EnvironmentVariablesBackend, LocalFilesystemBackend and MetastoreBackend ) |
Do you want to skip the secrets backend for all variables, or just a specific one? |
This is already possible at the secrets backend configuration airflow/airflow/providers/google/cloud/secrets/secret_manager.py Lines 60 to 62 in ae04488
|
You want to use variables from the secret backend for some variables but not others? |
I would not close @ashb, but rather discuss how we can implement, I think there are varying opinions here. At the very least, if others also have such strong opinion as you - we should document how to handle this - very common it seems - use case. I think we still can discuss how to do it - and maybe we should do it differently - but I think the use/case and rationale is very-sound and - more than that - pretty common. It has been raised quite a few times by different people. Since we have connections where most people keep their secret passwords, I think the ratio of secret variables vs. the non-secret ones is small and this use case is very common. I'd say more often than not people will have connections kept in secret, but they will have no secret variables. We are basically telling people to pay more for their secret backends and we do not even tell them how to avoid this. And I think making people write their onwn secret backend to handle such case is kinda over-the-top - especially that it seems common pattern for all the different secret backends. And our users do not often realise, that they could do their own implementation very easily. I actually see the point of the "different access" pattern that @ashb and I agree it is not perfect. I think what is even bigger problem is that it's the "DAG" writers to decide how each variable should be retrieved, which - I quite agree - is very prone to different kinds of problems But maybe we can find a a good solution that serves the use case but does not introduce the "different access" pattern. I Maybe a good solution will be to give them ready-to use simple implementation of such custom backend that you could "MixIn" with any other secret backend where for example chosing whether to use secret or not would be done based on Regexp match of the variable name? Maybe simply have a possibility to define an optional callable configurable in If the installation has this "callable" defined however, that could be much better, because we could check whether the variable should be read from secret and if so - we would just fail such an attempt. I think that would be a really nice and consistent approach. WDYT everyone? |
I closed the PR as the implementation is not close to something I am happy with, so any new solution will have share none of the current changes. But we can edit this issue yes, so sorry for closing the issue as well - that was hasty of me. Sorry. If a variable is a secret/sensitive, why not store it in a connection? We could add a new "generic" type of connection and then you can access it as What do you think of this idea @raphaelauv ? |
Just a comment from my side (as I was involved with a discussion including our users - very much related). One problem with that is that some users do not wan't (or can't - because their policies/tools limitaiton/shared secret approach) store their secrets in the "connection URL form", and that would force them to make airflow-specific format for secrets where they are using the same secret accross different services not only airflow. We have a very good example here recently (this comes from big, enterprise user) #19164 where corporate user already has their secret service account encrypted in their secret backend and rotated frequently automatically (and used by other services). This is perfect case for "secret variables" but would not work if we use connections. |
Another possible solution I can think of is to build another Secret Backend that actually allows "skipping" touching the "external" secret backend entirely for some conditions ("prefix", "patterns"). So users don't need to update all their DAGs when they need to change something and will be a single source of truth of where the values for this variable/connection comes from. For observability, we can probably create a page under "Admin" of where this secret comes from (similar to our airflow configuration page that has Running Configurations). |
Yeah - I thought about it too, but it would have to be "mix-in" type - we should be able to take any of the backends out there and "mix-in" the selection logic. That was my initial idea as well :). But I think having a callable is more Pythonic way - doing pretty much the same without the overhead of creating a "backend class" that would take "other backend class" as the "real backend to use". |
A possible third option (because what we need is more options): Create a new (Conceptually it's a simple idea, but the path forward from where Airflow is right now to that isn't so clear) |
my quick and dirty solution -> AIRFLOW__SECRETS__BACKEND=toto.CustomCloudSecretManagerBackend
AIRFLOW__SECRETS__BACKEND_KWARGS={"connections_prefix":"airflow-XXX-connection","variables_prefix":"airflow-XXX-variable","sep":"-","secret_lookup_prefix":"secret_"} from typing import Optional
from airflow.providers.google.cloud.secrets.secret_manager import CloudSecretManagerBackend
class CustomCloudSecretManagerBackend(CloudSecretManagerBackend):
def __init__(
self,
secret_lookup_prefix: Optional[str] = None,
**kwargs,
):
super().__init__(**kwargs)
self.secret_lookup_prefix = secret_lookup_prefix
def get_variable(self, key: str) -> Optional[str]:
if self.variables_prefix is None:
return None
if self.secret_lookup_prefix is not None:
if not key.startswith(self.secret_lookup_prefix):
return None
return self._get_secret(self.variables_prefix, key) |
Yep. Very much this, it is indeed EASY to write your own secret backend like that - but it's not easy to discover by the users that they can do it. Some way of either documenting this pattern, or (IMHO a bit better) supporting the "mix-in" should solve the problem nicely. |
Hi everyone, may I ask where were we in this situation, please? I feel like the PR of @raphaelauv was never merged ? In fact, I experience a same probleme so I would like to know how I could avoid that. Many thank !!! |
@nxhuy-github custom your secret backend the same way I did -> #19251 (comment) |
Yes, many thank @raphaelauv |
We have published a package here with @raphaelauv's fix: https://pypi.org/project/custom-cloud-secret-manager-backend-kasna/ It can be used like this:
|
I had to extend the functionality to make it works with airflow connections as well, adding the following function to the code mentioned above:
@maxexcloo probably you would like to add it to your package. I've tried yours first, but then I had to create a custom one to support connections in the secret manager. |
The code above forces us to use a specific prefix for secret names, and the secret manager will be ignored completely if the prefix is not present. That is a problem for existing secret that are not following this name convention. There's a commit already in the main branch with a fix to use a debug log instead of error, #27856 Meanwhile, for those waiting a new Airflow version or those who can't update to a newer version yet, I found a different and simpler approach to mute the log directly and keep using exactly the same functionality as before.
|
the full solution from typing import Optional
from airflow.providers.google.cloud.secrets.secret_manager import CloudSecretManagerBackend
class CustomCloudSecretManagerBackend(CloudSecretManagerBackend):
"""
https://github.com/apache/airflow/issues/19251
Add the option secret_lookup_prefix to the GCP CloudSecretManagerBackend
when set this option will only look inside GCP secret manager the variables or connections prefixed
with the same value
example:
with secret_lookup_prefix=None
Variable.get("TOTO") will call the GCP secret provider
Connection.get("TOTO") will call the GCP secret provider
with secret_lookup_prefix="secret_"
Variable.get("TOTO") will NOT call the GCP secret provider
Variable.get("secret_TOTO") will call the GCP secret provider with TOTO ( without the prefix secret_ )
Connection.get("TOTO") will NOT call the GCP secret provider
Connection.get("secret_TOTO") will call the GCP secret provider with TOTO ( without the prefix secret_ )
"""
def __init__(
self,
secret_lookup_prefix: Optional[str] = None,
**kwargs,
):
super().__init__(**kwargs)
self.secret_lookup_prefix = secret_lookup_prefix
def get_variable(self, key: str) -> Optional[str]:
if self.variables_prefix is None:
return None
if self.secret_lookup_prefix is not None:
if not key.startswith(self.secret_lookup_prefix):
return None
else:
key = key[len(self.secret_lookup_prefix):]
return self._get_secret(self.variables_prefix, key)
def get_conn_uri(self, conn_id: str) -> Optional[str]:
if self.connections_prefix is None:
return None
if self.secret_lookup_prefix is not None:
if not conn_id.startswith(self.secret_lookup_prefix):
return None
else:
conn_id = conn_id[len(self.secret_lookup_prefix):]
return self._get_secret(self.connections_prefix, conn_id) |
Thanks for the full details. |
currently |
is it mandatory to install it as a package? I'm having problems referencing my custom backend class in airflow.cfg. |
no need just give the path of the file from airflow_home |
Description
I use the gcp secret_manager as a secret_backend
2 problems :
something like
so change variable.py
and also change the macro
Replace the log level ERROR to WARNING would be better since we don't know if the secret do not exist or if it's really a problem of access permission.
Use case/motivation
Every Variable.get make a call to the secret_backend , would be great to make it configurable ( to first control the cost and the load on the secret_backend )
Related issues
No response
Are you willing to submit a PR?
Code of Conduct
The text was updated successfully, but these errors were encountered: