Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unclear documentation for the delegate_to parameter #9461

Closed
mik-laj opened this issue Jun 21, 2020 · 15 comments · Fixed by #29088
Closed

Unclear documentation for the delegate_to parameter #9461

mik-laj opened this issue Jun 21, 2020 · 15 comments · Fixed by #29088
Assignees
Labels
good first issue kind:bug This is a clearly a bug kind:documentation provider:google Google (including GCP) related issues

Comments

@mik-laj
Copy link
Member

mik-laj commented Jun 21, 2020

Hello,

Most GCP operators accept the delegate_to parameter. It is not super well documented and it can be confusing to users exactly when they would use this as opposed to another conn id.

A little more documentation on the following page would be helpful.
https://airflow.readthedocs.io/en/latest/howto/connection/gcp.html

You can get extra points for adding information about Impersonated credentials
. This feature is not supported by Airflow, but it is related (ticket).

Are you wondering how to start contributing to this project? Start by reading our contributor guide

Best regards,
Kamil

CC: @jaketf

@mik-laj mik-laj added kind:bug This is a clearly a bug provider:google Google (including GCP) related issues area:docs good first issue labels Jun 21, 2020
@mik-laj
Copy link
Member Author

mik-laj commented Jul 16, 2020

@olchas] Do you want to work on it? It looks like this is a task for you.

@olchas
Copy link
Contributor

olchas commented Jul 16, 2020

@mik-laj sure. Could you assign me to this issue, please?

@mik-laj
Copy link
Member Author

mik-laj commented Jul 17, 2020

@olchas We first need to ask a basic question. Shouldn't we abandon support for this feature? In what cases does it work? In what cases does it not work?

@olchas
Copy link
Contributor

olchas commented Jul 21, 2020

@mik-laj I am still looking into it. The name suggests that domain-wide delegation only makes sense for G Suite applications (so in terms of Airflow it should only be applied to GoogleDriveHook and GSheetsHook), but this article calls it a legacy branding and tells that it applies to Cloud Identity as well.

I am also still uncertain about how the two impersonation mechanisms can/should work together. As far as I can tell, domain-wide delegation is supposed to be used to impersonate user account using service account, while direct impersonation can be used to impersonate service account using either another service account or user account.

So, I can see two scenarios:

  1. You start with a service/user account that you use to directly impersonate some service account, that is then used to perform domain-wide delegation on some user.
  2. You start by performing domain-wide delegation on some user, and then use this user to impersonate some service account.

However, Credentials class from google.auth.impersonated_credentials module does not have with_subject method, so apparently it is impossible to use directly impersonated account to perform domain-wide delegation of authority, which renders first scenario impossible. On the other hand, it seems you can specify the delegate for source credentials and then use these credentials for direct impersonation as in scenario 2, but I did not have a chance to test it.

@jaketf, @amithmathew, do you perhaps have more insight on the topic?

@jaketf
Copy link
Contributor

jaketf commented Aug 6, 2020

Taking a look at the code now it seems we have this common GoogleBaseHook used by hooks for gsuite and cloud. This delegate_to seems not really not useful for cloud, and I don't think the scenario 2 of delegating to a human user to impersonate a service account is an advisable pattern / one worth supporting in airflow core. I think delegate_to should be removed / deprecated from the Google Cloud Hooks / Operators to avoid confusion.

To play devil's advocate: There may be use cases where users expect delegate_to to attribute API calls (e.g. a BQ Query) to the delegated human user. Again, I don't think I'd recommend this as an auditing posture as anyone could throw jake@foo.com into the delegate_to and bootstrap my IAM permissions. IMO This seems like something we shouldn't support.

@amithmathew
Copy link

delegate_to would be used to impersonate a user account using a (specific) service account - It does look like delegate_to may be used on the GSuite side of things.

@olchas
Copy link
Contributor

olchas commented Aug 7, 2020

@jaketf, @amithmathew so it would seem that the approach could be to:

  1. disallow simultaneous use of delegate_to and impersonation_chain (new argument used for direct impersonation) in GoogleBaseHook by raising an exception if both arguments are provided
  2. remove delegate_to from most Google operators and hooks, leaving it only in G Suite operators and hooks

WDYT?
CC: @turbaszek

@potiuk
Copy link
Member

potiuk commented Aug 7, 2020

Sounds like great plan.

@potiuk
Copy link
Member

potiuk commented Aug 7, 2020

One small problem though - we will have to deprecate the delegate_to parameter rather than remove them initially. In 2.1 we might be able to remove them,

@turbaszek
Copy link
Member

@olchas I'm in favour of 2nd approach. Regarding the deprecation - first I would check if current implementation works. If it does not then I see no reason for deprecation - let us just remove it.

@potiuk
Copy link
Member

potiuk commented Aug 7, 2020

@olchas I'm in favour of 2nd approach. Regarding the deprecation - first I would check if current implementation works. If it does not then I see no reason for deprecation - let us just remove it.

Agree with @turbaszek. I thought about deprecation in the sense that it will not fail when someone actually adds the "delegate_to" parameter. I am afraid there might be cases that someone already added this parameter and we do not want their DAGs to fail (even if this parameter was superfluous).

I believe we have **kwargs in most such operators but it's worth checking.

@jaketf
Copy link
Contributor

jaketf commented Aug 7, 2020

I agree remove delegate_to, make sure it is an ignored kwarg rather than breaking peoples DAGs if passed.

@shahar1
Copy link
Contributor

shahar1 commented Jan 9, 2023

@eladkal Please assign me :)

@shahar1
Copy link
Contributor

shahar1 commented Jan 21, 2023

I created a PR to resolve this issue. While tackling it, I came up with some questions:

  1. Should we create a clearer distinction between Google Cloud and Google Workspace (formerly gsuite) base hooks? That is to avoid misusing delegate_to inherently. In my PR, I had to add a deprecation warning for all of the hundreds of usages; it would have been easier if gsuite operators would use their base hook.
  2. After declaring delegate_to as deprecated, when do we remove it from the codebase? In what version it will be removed?

@eladkal
Copy link
Contributor

eladkal commented Jan 21, 2023

  1. Should we create a clearer distinction between Google Cloud and Google Workspace (formerly gsuite) base hooks? That is to avoid misusing delegate_to inherently. In my PR, I had to add a deprecation warning for all of the hundreds of usages; it would have been easier if gsuite operators would use their base hook.
  2. After declaring delegate_to as deprecated, when do we remove it from the codebase? In what version it will be removed?
  1. We do want to split the provider Consider splitting Google Provider  #15933
    If we can decouple to make it easier for future separation I think we should

  2. We can do a major release as soon as after the deprecated code has been released. Basically 1 release with warning and release after removal. Personally I prefer to give users some time to adjust and migrate to new code but if there is a reason to do it with a followup release (like parameter that causes high confusion) then we can do it.

austinletson added a commit to austinletson/airflow that referenced this issue Nov 18, 2023
Restore delegate_to param to GoogleDiscoveryApiHook to allow to
specification of a service account for domain-wide delegation when using
GoogleDiscoveryApiHook to access Google APIs which support domain-wide
delegation but do not have a dedicated hook (such as Workspaces Admin
API).

Note that based on the discussion in apache#9461, the delegate_to param was
deprecated in apache#29088 (and removed in apache#30748) from many hooks
including GoogleDiscoveryApiHook. However, the delegate_to param was
not removed from the docstring for the GoogleDiscoveryApiHook
constructor.

Update GCP connection docs to reflect delegate_to param in
GoogleDiscoveryApiHook usage only when using Google APIs that support
domain-wide delegation.
Taragolis pushed a commit that referenced this issue Nov 18, 2023
Restore delegate_to param to GoogleDiscoveryApiHook to allow to
specification of a service account for domain-wide delegation when using
GoogleDiscoveryApiHook to access Google APIs which support domain-wide
delegation but do not have a dedicated hook (such as Workspaces Admin
API).

Note that based on the discussion in #9461, the delegate_to param was
deprecated in #29088 (and removed in #30748) from many hooks
including GoogleDiscoveryApiHook. However, the delegate_to param was
not removed from the docstring for the GoogleDiscoveryApiHook
constructor.

Update GCP connection docs to reflect delegate_to param in
GoogleDiscoveryApiHook usage only when using Google APIs that support
domain-wide delegation.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
good first issue kind:bug This is a clearly a bug kind:documentation provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

9 participants