Skip to content

HttpHook.run() alters the extra_options dict passed to it #51686

@brki

Description

@brki

Apache Airflow Provider(s)

http

Versions of Apache Airflow Providers

apache-airflow-providers-http==5.3.0

Apache Airflow version

2.11.0

Operating System

debian bookworm

Deployment

Official Apache Airflow Helm Chart

Deployment details

not relevant

What happened

A HttpHook was instantiated: hook = HttpHook(...)
A value for extra_options was created dynamically, based on some condition.
In this case, it looked like:

extra_options_val = {"proxies": {"http": "redacted", "https": "redacted"}}

hook.run(..., extra_options=extra_options_val))

This runs OK.

A second call is made (to get a second page of results):

hook.run(..., extra_options=extra_options_val))

This fails (because the "proxies" value was popped out of the extra_options_val dict on the first call to hook.run().

What should happen:
The extra_options dict passed in to run() should not be modified.

HttpHook.run() should make a deepcopy of extra_options, or HttpHook._configure_session_from_extra() should not remove values from the extra_options parameter.

What you think should happen instead

The extra_options dict passed in to run() should not be modified.

HttpHook.run() should make a deepcopy of extra_options, or HttpHook._configure_session_from_extra() should not remove values from the extra_options parameter.

How to reproduce

from copy import deepcopy
import airflow
from airflow.providers.standard.operators.python import PythonOperator
from airflow.providers.http.hooks.http import HttpHook


def http_hook_test():
    extra_options = {"verify_ssl": False}
    original_extra_options = deepcopy(extra_options)

    # jsonplaceholder Connection has
    # * Host = https://jsonplaceholder.typicode.com
    # * Extra = {"foo": "bar"}
    # Note: some Extra value is necessary, otherwise the extra_options passed to run are not used, see https://github.com/apache/airflow/issues/51685
    http_hook = HttpHook(method='GET', http_conn_id='jsonplaceholder')

    http_hook.run(
        endpoint=f"/posts/1",
        extra_options=extra_options,
    )
    assert extra_options == original_extra_options, f"extra_options: {extra_options}, original_extra_options: {original_extra_options}"


with airflow.DAG(
    dag_id='http_hook_test_dag',
    schedule_interval=None,
    start_date=airflow.utils.dates.days_ago(1),
    catchup=False,
) as dag:
    test_http_hook = PythonOperator(
        task_id='test_http_hook',
        python_callable=http_hook_test,
    )

Anything else

Modifying a mutable value passed in should, imho, be avoided in framework libraries. Sometimes it can make sense, for example in update_these_values(values_to_update). But in general, the caller of a framework library function will not expect it to modified, unless the function documentation explicitly states that it will be modified.

This also seems to apply to the Airflow 3.0.2 HttpHook.

The provided reproducible example uses verify_ssl instead of proxies in extra_options because it's easier to clearly reproduce, but the same behaviour is seen with proxies.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

Metadata

Metadata

Assignees

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions