Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Glue Hook not using aws_conn_id for logs access resulting in Error #37976

Closed
2 tasks done
VincentChantreau opened this issue Mar 7, 2024 · 2 comments · Fixed by #38010
Closed
2 tasks done

Glue Hook not using aws_conn_id for logs access resulting in Error #37976

VincentChantreau opened this issue Mar 7, 2024 · 2 comments · Fixed by #38010
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:amazon AWS/Amazon - related issues

Comments

@VincentChantreau
Copy link
Contributor

VincentChantreau commented Mar 7, 2024

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

8.16.0

Apache Airflow version

2.8.1

Operating System

Apache Airflow Official Docker Image

Deployment

Official Apache Airflow Helm Chart

Deployment details

Deployed on AWS EKS

What happened

When using the Glue Operator and enabling the verbose parameter, we get the following error.

It's seems that the the GlueHook is not using a LogHook, but rather a newly instancied boto3 client here :

log_client = boto3.client("logs")

So this new instance of boto3 client is using the pod ServiceAccount IAM credentials and not the provided Hook (via aws_conn_id) which are differents.

So when we are using the Operator with the verbose parameter set to True, we get this stack trace:

  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/operators/glue.py", line 207, in execute
    glue_job_run = self.glue_job_hook.job_completion(self.job_name, self._job_run_id, self.verbose)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 288, in job_completion
    ret = self._handle_state(job_run_state, job_name, run_id, verbose, next_log_tokens)
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 325, in _handle_state
    self.print_job_logs(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 269, in print_job_logs
    continuation_tokens.output_stream_continuation = display_logs_from(
  File "/home/airflow/.local/lib/python3.10/site-packages/airflow/providers/amazon/aws/hooks/glue.py", line 236, in display_logs_from
    for response in paginator.paginate(
  File "/home/airflow/.local/lib/python3.10/site-packages/botocore/paginate.py", line 269, in __iter__
    response = self._make_request(current_kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/botocore/paginate.py", line 357, in _make_request
    return self._method(**current_kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/botocore/client.py", line 553, in _api_call
    return self._make_api_call(operation_name, kwargs)
  File "/home/airflow/.local/lib/python3.10/site-packages/botocore/client.py", line 1009, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.errorfactory.AccessDeniedException: An error occurred (AccessDeniedException) when calling the FilterLogEvents operation: User: arn:aws:sts::***redacted*** is not authorized to perform: logs:FilterLogEvents on resource: arn:aws:logs:***redacted***: because no identity-based policy allows the logs:FilterLogEvents action

What you think should happen instead

The GlueHook should create a LogHook like in the SageMaker hook

self.logs_hook = AwsLogsHook(aws_conn_id=self.aws_conn_id)
that will use the Hook provided when instanciating the Operator via the aws_conn_id

How to reproduce

  1. Create an Airflow AWS Connection with a role ARN different than the one used by the Workers.
  2. Execute a GlueJobOperator task with verbose parameter set to True
    GlueJobOperator(
            aws_conn_id=connection_id,
            task_id=task_id,
            job_name=job_name,
            verbose=True,
            *args,
            **kwargs,
        )

Anything else

No response

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@VincentChantreau VincentChantreau added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Mar 7, 2024
Copy link

boring-cyborg bot commented Mar 7, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@VincentChantreau VincentChantreau changed the title Glue Hook not using aws_conn_id for logs access Glue Hook not using aws_conn_id for logs access resulting in Error Mar 8, 2024
@eladkal eladkal added provider:amazon AWS/Amazon - related issues good first issue and removed needs-triage label for new issues that we didn't triage yet labels Mar 8, 2024
@VincentChantreau
Copy link
Contributor Author

@eladkal I've submitted a PR to address the issue :)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers good first issue kind:bug This is a clearly a bug provider:amazon AWS/Amazon - related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants