Skip to content

Unhandled botocore.exceptions.NoCredentialsError in async_wait #45622

@nrobinson-intelycare

Description

@nrobinson-intelycare

Apache Airflow Provider(s)

amazon

Versions of Apache Airflow Providers

apache-airflow-providers-amazon==9.2.0
apache-airflow-providers-common-compat==1.3.0
apache-airflow-providers-common-io==1.5.0
apache-airflow-providers-common-sql==1.21.0
apache-airflow-providers-fab==1.5.2
apache-airflow-providers-ftp==3.12.0
apache-airflow-providers-http==5.0.0
apache-airflow-providers-imap==3.8.0
apache-airflow-providers-postgres==6.0.0
apache-airflow-providers-sendgrid==4.0.0
apache-airflow-providers-smtp==1.9.0
apache-airflow-providers-snowflake==6.0.0
apache-airflow-providers-sqlite==4.0.0

Apache Airflow version

2.10.4

Operating System

Amazon Linux 2023.6.20241212

Deployment

Virtualenv installation

Deployment details

Custom CDK stack with:

  • EC2 instance running Airflow, managed by systemd
  • IAM role granting permissions to AWS services
  • RDS instance running Postgres

The Airflow virtualenv is managed by uv.

What happened

When running a DAG with a deferrable BatchOperator and using boto3 credential strategy ({base_aws.py:180} INFO - No connection ID provided. Fallback on boto3 credential strategy (region_name='us-east-1')) a deferrable BatchOperator task can have its trigger immediately fail after submitting a batch job.

Although the trigger fails immediately, the batch job had launched successfully, and executes until successful exit, unbeknownst to Airflow.

Due to the scheduling of the DAG, there currently have not been any overlaps with the failed task's batch job and a subsequent task run yet, but having overlapping runs would be undesirable.

This error happens about once a week. I believe it has something to do with amazon-ssm-agent not rotating the credentials quickly enough.

What you think should happen instead

async_wait() should catch the NoCredentialsError and continue to the next waiter attempt.

https://github.com/apache/airflow/blob/main/providers/amazon/src/airflow/providers/amazon/aws/utils/waiter_with_logging.py#L133

How to reproduce

Hard to reproduce, but invalidating AWS credentials right before the trigger initializes would likely produce a similar traceback.

Anything else

Traceback from task log:

[2025-01-10, 20:00:19 EST] {baseoperator.py:1806} ERROR - Trigger failed:
Traceback (most recent call last):
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/jobs/triggerer_job_runner.py", line 558, in cleanup_finished_triggers
    result = details["task"].result()
             ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/jobs/triggerer_job_runner.py", line 630, in run_trigger
    async for event in trigger.run():
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/providers/amazon/aws/triggers/base.py", line 143, in run
    await async_wait(
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/providers/amazon/aws/utils/waiter_with_logging.py", line 133, in async_wait
    await waiter.wait(**args, WaiterConfig={"MaxAttempts": 1})
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/waiter.py", line 49, in wait
    return await AIOWaiter.wait(self, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/waiter.py", line 95, in wait
    response = await self._operation_method(**kwargs)
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/waiter.py", line 78, in __call__
    return await self._client_method(**kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/client.py", line 394, in _make_api_call
    http, parsed_response = await self._make_request(
                            ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/client.py", line 420, in _make_request
    return await self._endpoint.make_request(
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/endpoint.py", line 96, in _send_request
    request = await self.create_request(request_dict, operation_model)
              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/endpoint.py", line 84, in create_request
    await self._event_emitter.emit(
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/hooks.py", line 68, in _emit
    response = await resolve_awaitable(handler(**kwargs))
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/_helpers.py", line 6, in resolve_awaitable
    return await obj
           ^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/signers.py", line 24, in handler
    return await self.sign(operation_name, request)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/aiobotocore/signers.py", line 90, in sign
    auth.add_auth(request)
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/botocore/auth.py", line 423, in add_auth
    raise NoCredentialsError()
botocore.exceptions.NoCredentialsError: Unable to locate credentials
[2025-01-10, 20:00:19 EST] {taskinstance.py:3311} ERROR - Task failed with exception
Traceback (most recent call last):
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 767, in _execute_task
    result = _execute_callable(context=context, **execute_callable_kwargs)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/models/taskinstance.py", line 733, in _execute_callable
    return ExecutionCallableRunner(
           ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/utils/operator_helpers.py", line 252, in run
    return self.func(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/opt/airflow/git/.venv/lib/python3.12/site-packages/airflow/models/baseoperator.py", line 1807, in resume_execution
    raise TaskDeferralError(next_kwargs.get("error", "Unknown"))
airflow.exceptions.TaskDeferralError: Trigger failure

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions