Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implemented Basic EKS Integration #16571

Merged
merged 23 commits into from
Aug 19, 2021
Merged

Implemented Basic EKS Integration #16571

merged 23 commits into from
Aug 19, 2021

Conversation

ferruzzi
Copy link
Contributor

@ferruzzi ferruzzi commented Jun 21, 2021

Added hooks and operators for initial EKS support

Includes Hooks and operators for the following EKS endpoints, along with tests, and documentation:

  • Create Cluster
  • Create Nodegroup
  • Delete Cluster
  • Delete Nodegroup
  • Run Pod
  • Describe Cluster
  • Describe Nodegroup
  • List Cluster
  • List Nodegroup

Increases the required version for the moto package from 2.0.8 to 2.0.10 in order to ingest the new EKS support.

closes: #8544


^ Add meaningful description above

Read the Pull Request Guidelines for more information.
In case of fundamental code change, Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in UPDATING.md.

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jun 21, 2021

Not done looking at all the test results yet, but looks like most/all were caused by either a single misnamed param that slipped through in an example dag, or whitespace issues in the docs. Correcting.

@ashb
Copy link
Member

ashb commented Jun 22, 2021

❤️

Can you explain why you had to change the label-when-reviewed action in this PR?

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a first pass -- haven't reviewed in depth at all yet.

kwargs["client_type"] = self.client_type
super().__init__(*args, **kwargs)

def create_cluster(self, name: str, roleArn: str, resourcesVpcConfig: Dict, **kwargs) -> str:
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Interesting conundrum here. The Python naming style says these args should be role_arn and resources_vpc_config -- but maybe it makes sense to mirror the Boto/AWS API names here.

What do other think, and what do we do in other AWS hooks?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My thinking, for what it is worth, was that the Hooks should mirror the API as closely as possible. That would improve the learning curve for those coming to Airflow with some boto experience, and also make it easier for those new to the boto API and looking at their docs to see what is going on since the names match up.

Existing AWS hooks are a mixed bag, but do lean towards snake_case.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is worth following the conventions used by boto to avoid any inaccuracies or mistake.

@ferruzzi

This comment has been minimized.

@dimberman dimberman self-requested a review June 22, 2021 19:16
@ferruzzi ferruzzi force-pushed the eksP0 branch 4 times, most recently from 4c89c12 to ab5e1c0 Compare June 25, 2021 03:28
@potiuk potiuk closed this Jun 25, 2021
@potiuk potiuk reopened this Jun 25, 2021
@ferruzzi ferruzzi closed this Jun 25, 2021
@ferruzzi ferruzzi reopened this Jun 25, 2021
@ferruzzi ferruzzi closed this Jun 26, 2021
@ferruzzi ferruzzi reopened this Jun 26, 2021
@ferruzzi
Copy link
Contributor Author

@dimberman @mik-laj @vikramkoka - All CI tests are currently passing, ready for a proper review at your convenience.

eks_cluster_name,
]
if role_arn:
cli_args.extend(["--role-arn", role_arn])
Copy link
Member

@mik-laj mik-laj Jun 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why we need to use external tool?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it impossible to generate this token using STS?

import base64
import boto3
import re
from botocore.signers import RequestSigner

STS_TOKEN_EXPIRES_IN = 60

def get_bearer_token(session, cluster_id, region):
    client = session.client('sts', region_name=region)
    service_id = client.meta.service_model.service_id

    signer = RequestSigner(
        service_id,
        region,
        'sts',
        'v4',
        session.get_credentials(),
        session.events
    )

    params = {
        'method': 'GET',
        'url': 'https://sts.{}.amazonaws.com/?Action=GetCallerIdentity&Version=2011-06-15'.format(region),
        'body': {},
        'headers': {
            'x-k8s-aws-id': cluster_id
        },
        'context': {}
    }

    signed_url = signer.generate_presigned_url(
        params,
        region_name=region,
        expires_in=STS_TOKEN_EXPIRES_IN,
        operation_name=''
    )

    base64_url = base64.urlsafe_b64encode(signed_url.encode('utf-8')).decode('utf-8')

    # remove any base64 encoding padding:
    return 'k8s-aws-v1.' + re.sub(r'=*', '', base64_url)

# If making a HTTP request you would create the authorization headers as follows:

session = boto3.session.Session()
headers = {'Authorization': 'Bearer ' + get_bearer_token(session, 'my_cluster', 'us-east-1')}

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In the case of Google, we should be using normal Google tokens sometime.

hook.get_credentials().token should be enough to generate a valid access token, but it is not clearly documented in public docs.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using the AWS CLI was not our preference either, but had to settle for this due to other options not working correctly. The EKS service does not support providing the token directly at all. Searching online came up with a few workarounds, but the lesser evil was assuming that someone using an AWS service would be able to install an AWS tool. There is also precedent for it in Google's Kubernetes Operators using the gcloud local script to accomplish the same thing.

I'll test using your provided example to see if that works and report back. Dropping that dependency would be great.

Copy link
Member

@mik-laj mik-laj Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

that is asking for a problem, later IMHO.

I agree that this could bring unexpected problems in the future, but having to install AWS CLI by users and integrating Airflow with AWS CLI itself can be very difficult in edge cases too. We do not have a perfect solution here. The authorization method rarely changes compared to the product API, so I believe it will be enough stable approach.

Especially since it is described in the AWS documentation.

Amazon EKS uses IAM to provide authentication to your Kubernetes cluster through the AWS IAM authenticator for Kubernetes.

https://docs.aws.amazon.com/eks/latest/userguide/install-aws-iam-authenticator.html

or

This can be used as an alternative to the aws-iam-authenticator.

https://docs.aws.amazon.com/cli/latest/reference/eks/get-token.html

In the aws-iam-authenticator documentation, we have the code snippet I cited above.
https://github.com/kubernetes-sigs/aws-iam-authenticator#api-authorization-from-outside-a-cluster

Copy link
Member

@mik-laj mik-laj Jun 28, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you are concerned about maintaining this code, why not add it to AWS SDK for Python? It could have been helpful for other people as well. Have you thought to create an internal ticket on this topic? We may not solve this problem now, but we may hope to improve it in the future.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Early tests with the proposed get_bearer_token (#16571 (comment)) are promising. I'm going to play with some edge cases and see how it holds up, but I like the direction that is taking.

In general, are we happy with the security practices used there? Any concerns we need to pay attention to or address with that approach?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This should be addressed in a6ee2c3

Copy link
Member

@ashb ashb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code/features looks good now, the integration logo appears to be empty though?

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Aug 18, 2021

Very strange. It looks fine locally, but you are right, it appears to be empty here on the PR. Just overwrote it with a new file, let's see if that fixes it. It is a 300x300 png file, same as the others AFAICT.

[EDIT: Looks like no change on the PR side with the new file. Was there something specific I am supposed to do to process/include the image?]

[EDIT 2: It displays fine on my fork, Are we sure they normally display in a PR? Seems odd that they wouldn't, but before I go chasing smoke I'd like to verify there is actually an issue.]

[EDIT 3: If you click "view file" in the PR here it does show correctly (now?), it just doesn't display in the file diff view.]

@ashb
Copy link
Member

ashb commented Aug 19, 2021

Yup, image looks good now.

Previously it was a 98B file, now it's 11KiB :)

@github-actions
Copy link

The PR most likely needs to run full matrix of tests because it modifies parts of the core of Airflow. However, committers might decide to merge it quickly and take the risk. If they don't merge it quickly - please rebase it to the latest main at your convenience, or amend the last commit of the PR, and push it with --force-with-lease.

@github-actions github-actions bot added the full tests needed We need to run full set of tests for this PR to merge label Aug 19, 2021
@ashb ashb merged commit bee48f3 into apache:main Aug 19, 2021
@ferruzzi ferruzzi deleted the eksP0 branch January 24, 2022 19:57
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:dev-tools area:providers full tests needed We need to run full set of tests for this PR to merge kind:documentation provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Create EKS operators
8 participants