-
Notifications
You must be signed in to change notification settings - Fork 14.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add IAM authentication to Amazon Redshift Connection by AWS Connection #28187
Conversation
Could you also update documentation, you could have a look in BREEZE.rst how to build and check documentation locally |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Add some findings. Personally I'm do not work with Redshift and redshift-connector
for almost 1.5 year.
So might someone who work on daily basis could review more detailed
@@ -44,6 +58,10 @@ class RedshiftSQLHook(DbApiHook): | |||
hook_name = "Amazon Redshift" | |||
supports_autocommit = True | |||
|
|||
def __init__(self, *args, aws_conn_id: str = "aws_default", **kwargs) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
aws_conn_id
might be None, in this case default boto3
strategy will use (usual EC2 Instance Profile or ECS Execution Role)
@@ -62,6 +80,9 @@ def _get_conn_params(self) -> dict[str, str | int]: | |||
|
|||
conn_params: dict[str, str | int] = {} | |||
|
|||
if conn.extra_dejson.get("iam", False): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'm worry that redshift-connector
also might use same connection parameter, see list available parameters: https://github.com/aws/amazon-redshift-python-driver#connection-parameters
As result it might broke someone connections
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@IAL32 thoughts on this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure if this would break any connections. @o-nikolas I will test this week and come back with findings.
I would also like some feedback from someone with more experience with RedShift. I don't use it myself very often either |
@shubham22 a chance that you know who might help with review extending integration Redshift Connection and |
@vincbeck @ferruzzi @o-nikolas @shubham22 can you please take a look? |
I am not a Redshift expert myself so I dont really want to approve blindly. It is probably worth trying to find someone who's familiar with Redshift. Maybe @shubham22 can help us? |
I'm also definitely not a Redshift expert, but nothing there jumps out as a problem to me. |
Apologies for the delayed response -- I subscribed to all Airflow GitHub discussions in the hope that I will be able to follow them, but ended up drowning my direct mentions. I will see if I can find a Redshift expert to chime on this one by early next week. |
This pull request has been automatically marked as stale because it has not had recent activity. It will be closed in 5 days if no further activity occurs. Thank you for your contributions. |
0568cc0
to
a238355
Compare
I rebased this one, it looks cool, but maybe someone from teh AWS team can comment on it ? |
@o-nikolas @shubham22 @eladkal so far testing against a redshift cluster with other Redshift connections not using IAM I am not seeing anything break connections wise. It is just setting the iam to false when it is not passed in as extra args which would then indicate the connection expects credentials in the connection object in airflow rather than getting them through IAM. It might be worth having a unit test that involves both a non IAM and IAM based connection created and checking to ensure the arg is not overwritten to test that case. Let me know if you want me to test further and see if I can find any edge cases where it does break. @potiuk I was doing some testing for the AWS folks as someone with a bit deeper knowledge of Redshift. I work at Amazon and have a working relationship with those tagged above. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM - but would love someone with AWS hat on to confirm it looks cool :)
LGTM |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It looks good to me even though this is not an area I am an expert in. If testing has been done and it works, the code looks good to me
Since there was a recent change introduced on OSS - apache/airflow#28187 we are calling the `get_iam_token` method even for the default redshift connection which doesn't have a host by default, therefore failing in CI. In this PR we are overriding the default connection - `redshift_conn` with `iam: false` so that the `get_iam_token` is not called. Once this issue if fixed in the OSS Airflow we can remove this change.
closes: #28000
Using same technique as already implemented into PostgreSQL Hook - manual obtain credentials by call GetClusterCredentials thought Redshift API.
airflow/airflow/providers/postgres/hooks/postgres.py
Lines 221 to 235 in 56b5f3f