Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add IAM authentication to Amazon Redshift Connection by AWS Connection #28187

Merged
merged 3 commits into from
May 2, 2023

Conversation

IAL32
Copy link
Contributor

@IAL32 IAL32 commented Dec 7, 2022

closes: #28000

Using same technique as already implemented into PostgreSQL Hook - manual obtain credentials by call GetClusterCredentials thought Redshift API.

if conn.extra_dejson.get("redshift", False):
port = conn.port or 5439
# Pull the custer-identifier from the beginning of the Redshift URL
# ex. my-cluster.ccdre4hpd39h.us-east-1.redshift.amazonaws.com returns my-cluster
cluster_identifier = conn.extra_dejson.get("cluster-identifier", conn.host.split(".")[0])
redshift_client = AwsBaseHook(aws_conn_id=aws_conn_id, client_type="redshift").conn
# https://boto3.amazonaws.com/v1/documentation/api/latest/reference/services/redshift.html#Redshift.Client.get_cluster_credentials
cluster_creds = redshift_client.get_cluster_credentials(
DbUser=login,
DbName=self.database or conn.schema,
ClusterIdentifier=cluster_identifier,
AutoCreate=False,
)
token = cluster_creds["DbPassword"]
login = cluster_creds["DbUser"]

@IAL32 IAL32 requested a review from eladkal as a code owner December 7, 2022 09:54
@boring-cyborg boring-cyborg bot added area:providers provider:amazon-aws AWS/Amazon - related issues labels Dec 7, 2022
@Taragolis
Copy link
Contributor

Could you also update documentation, you could have a look in BREEZE.rst how to build and check documentation locally

Copy link
Contributor

@Taragolis Taragolis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add some findings. Personally I'm do not work with Redshift and redshift-connector for almost 1.5 year.
So might someone who work on daily basis could review more detailed

@@ -44,6 +58,10 @@ class RedshiftSQLHook(DbApiHook):
hook_name = "Amazon Redshift"
supports_autocommit = True

def __init__(self, *args, aws_conn_id: str = "aws_default", **kwargs) -> None:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

aws_conn_id might be None, in this case default boto3 strategy will use (usual EC2 Instance Profile or ECS Execution Role)

@@ -62,6 +80,9 @@ def _get_conn_params(self) -> dict[str, str | int]:

conn_params: dict[str, str | int] = {}

if conn.extra_dejson.get("iam", False):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm worry that redshift-connector also might use same connection parameter, see list available parameters: https://github.com/aws/amazon-redshift-python-driver#connection-parameters

As result it might broke someone connections

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@IAL32 thoughts on this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not sure if this would break any connections. @o-nikolas I will test this week and come back with findings.

@IAL32
Copy link
Contributor Author

IAL32 commented Dec 8, 2022

I would also like some feedback from someone with more experience with RedShift. I don't use it myself very often either

@Taragolis
Copy link
Contributor

@shubham22 a chance that you know who might help with review extending integration Redshift Connection and redshift-connector?

@eladkal
Copy link
Contributor

eladkal commented Jan 9, 2023

@vincbeck @ferruzzi @o-nikolas @shubham22 can you please take a look?

@vincbeck
Copy link
Contributor

vincbeck commented Jan 9, 2023

I am not a Redshift expert myself so I dont really want to approve blindly. It is probably worth trying to find someone who's familiar with Redshift. Maybe @shubham22 can help us?

@ferruzzi
Copy link
Contributor

ferruzzi commented Jan 9, 2023

I'm also definitely not a Redshift expert, but nothing there jumps out as a problem to me.

@o-nikolas o-nikolas self-requested a review as a code owner January 25, 2023 06:02
@shubham22
Copy link

Apologies for the delayed response -- I subscribed to all Airflow GitHub discussions in the hope that I will be able to follow them, but ended up drowning my direct mentions.

I will see if I can find a Redshift expert to chime on this one by early next week.

@github-actions
Copy link

This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions.

@github-actions github-actions bot added stale Stale PRs per the .github/workflows/stale.yml policy file needs-triage label for new issues that we didn't triage yet and removed stale Stale PRs per the .github/workflows/stale.yml policy file labels Mar 25, 2023
@eladkal eladkal removed the needs-triage label for new issues that we didn't triage yet label Mar 27, 2023
@eladkal eladkal mentioned this pull request Mar 27, 2023
@potiuk
Copy link
Member

potiuk commented Apr 29, 2023

I rebased this one, it looks cool, but maybe someone from teh AWS team can comment on it ?

@cjames23
Copy link
Contributor

@o-nikolas @shubham22 @eladkal so far testing against a redshift cluster with other Redshift connections not using IAM I am not seeing anything break connections wise. It is just setting the iam to false when it is not passed in as extra args which would then indicate the connection expects credentials in the connection object in airflow rather than getting them through IAM. It might be worth having a unit test that involves both a non IAM and IAM based connection created and checking to ensure the arg is not overwritten to test that case. Let me know if you want me to test further and see if I can find any edge cases where it does break.

@potiuk I was doing some testing for the AWS folks as someone with a bit deeper knowledge of Redshift. I work at Amazon and have a working relationship with those tagged above.

Copy link
Member

@potiuk potiuk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - but would love someone with AWS hat on to confirm it looks cool :)

@eladkal
Copy link
Contributor

eladkal commented May 2, 2023

LGTM

Copy link
Contributor

@vincbeck vincbeck left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks good to me even though this is not an area I am an expert in. If testing has been done and it works, the code looks good to me

@potiuk potiuk merged commit 2f247a2 into apache:main May 2, 2023
@IAL32 IAL32 deleted the ac/28000-iam-redshift branch May 2, 2023 13:59
tatiana pushed a commit to astronomer/astro-sdk that referenced this pull request May 26, 2023
Since there was a recent change introduced on OSS -
apache/airflow#28187 we are calling the
`get_iam_token` method even for the default redshift connection which
doesn't have a host by default, therefore failing in CI. In this PR we
are overriding the default connection - `redshift_conn` with `iam:
false` so that the `get_iam_token` is not called.

Once this issue if fixed in the OSS Airflow we can remove this change.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:amazon-aws AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Add IAM authentication to Amazon Redshift Connection by AWS Connection
9 participants